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ana sequencing of human genes 
scientific research. The se era ence 



SEQUENCES CHARACTERISTIC OF HUMAN GENE TRANSCRIPTION PRODUCT 

Technical Field 

The present invention relates to newly identified 
polynucleotide sequences corresponding to transcription 
products cf human genes, and to complete gene sequences 
associated therewith . 

.Background 

This invention relates to human genes. Identification 
g cf human genes is a major goal of modern 

human genes is more than 
just a scientific curiosity. For example, by identifying 
genes and determining their sequences, scientists have been 
aoie to make large quantities of valuable human "cene 
s." These include human insulin, interferon, Factor 
u^.cr necrosis ractcr, human g rovrt h h o rm one, tissue 

^ ~ a- a , a. .a numerous ctner ccr.pcur.es. 

rtcc^cr.aily, Knowledge cf gene sequences can crovide the kev 
tc treatment or cure of genetic diseases (such as muscular 
dystrophy and cystic fibrosis) . The present invention 

represents a quantum leap forward in mankind's knowledge of 
n u m. a n gene s e cu e n c e s 

There are several basic concepts cf molecular bioloov 
rsMicn figure prominently in the invention. A brief 



ViII . t n rrn y 



w*. , concepts rcilows. 
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deoxyribonucleic acid, DNA* (Some viruses contain genes of 
ribonucleic acid, RNA.) The genetic information resides in 
the particular sequence in which the bases are arranged. A 
short sequence of nucleotides is often called a polynucleotide 
5 or an oligonucleotide. 

Like genes, polypeptides are built from long strings of 
individual units. These units are amino acids. The 
nucleotide sequence of a gene tells the cell the sequence in 
which to arrange the amino acids to make the polypeptide 

10 encoded by that gene. Xn general, chains of up to about 200 

amino acids are called polypeptides, while proteins are larger 
molecules made up of polypeptide subunits ; both types of 
molecules are referred to generally herein as polypeptides. 
A triplet of nucleotides (codon) in DNA codes for each amino 

15 acid or signals the beginning or end of the message 

(anticodon) . The term codon is also used for the 

corresponding (and complementary) sequences of three 
nucleotides in the mRNA into which the original DNA sequence 
is transcribed . 

20 Generally, enzymes in the cell transcribe the permanent 

DNA of the gene into a temporary RNA copy, called messenger 
RNA or mRNA. The mRNA , in turn, can be translated into a 
polypeptide by the cell. This entire process is called gene 
expression, and the polypeptide is the gene product encoded by 

25 the gene. 

Scientists have previously discovered how to reverse the 
transcription process and copy mRNA back into DNA using an 
enzyme called reverse transcriptase. The resulting is called 
complementary DNA, or cDNA. This is schematically shown in 

3 0 the single Figure. When substantially all of the mRNA from 

one cell or tissue is converted to cDNA at once and cloned 
into multi-Die copies of a recombinant vector to allow 
replication and manipulation in the ^Laboratory , the result is 
called a cDNA library. 

35 The various types cf genes include those which code for 

polypeptides, those which are transcribed into RNA but are net 
translated into polypeptides, and those whose functional 
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significance does not demand that they be transcribed at ail. 
Most genes are found on large molecules of DNA located in 
chromosomes. Double stranded cDNA carries all the information 
of a gene. Each base of the first strand is joined to a 
complementary base (hybridized) in the second strand. The 
linear DNA molecules in chromosomes have thousands of genes 
distributed along their length. Chromosomes include both 
coding regions (coding for polypeptides) and noncoding 
regions; the coding regions represent only about three percent 
of the total chromosome sequence. 

An individual gene has regulatory regions that include a 
promoter which directs expression of the gene, a coding region 
which can code for a polypeptide, and a termination signal. 
Tr.e regulatory DNA sequence is usually a noncoding region that 
determines if, where, when, and at what level a particular 
gene is expressed. 

The coding regions of many genes are discontinuous, with 
coding sequences (excns) alternating with noncoding regions 
(mtrons) . The final mRNA copy of the gene does not include 
these mtrons (which can be much longer than the coding region 
itself) , although it does contain certain untranslated regions 
that usually do not code for the polypeptide gene product. 
Untranslated sequences at the beginning and end of the mRNA 
are known as 5'- and 2 ' -untranslated regions, respectively, 
mis nomenclature reflects the orientation of the nucleotide 
constituents 



- w^NA is a DNA copy of a messenger RNA , which contains 
cf rhe e>: ~ ns c - a gene. The cDNA can be theuoht of as 
r.avmg three parts: an untranslated 5' leader, an 
uninterrupted pelypect ide-codmg sequence, and a 2' 
untranslated region. The untranslated leader and trailmc 
sequences are important for initiation of translation, rJFN'A 
stability, and other functions. The untranslated leader and 
trailing sequences are called 5'- ana 2 ! -untrans 1 a t e d 
sequences, respectively. Tne untranslated sequence is 

-i--_^y linger mm tne 5' untranslated leader. and cm ee 
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regions typically have many, randomly-distributed stop codons, 
and do not display the nonrandom base arrangements found in 
coding sequences- The 5 1 -untranslated sequence is relatively 
short, generally between 20 and 200 bases. The 3 1 - 
5 untranslated sequence is often many times longer, up to 

several thousand bases. 

The translated or coding sequence begins with a 
translational start codon (AUG or GUG) and ends with a 
translational stop codon (UAA, UGA, or UAG) . Generally, 

10 translation begins at the first "start" codon on the mRNA and 

proceeds to the first "stop" codon. Coding sequences can be 
distinguished by their nonrandom distribution of bases; 
numerous computer algorithms have been developed to 
distinguish coding from noncoding regions in this way. 

15 Human DNA differs from person to person. No two persons 

(except perhaps identical twins) have identical DNA. While 
the differences, called allelic variations or polymorphisms, 
are slight on a molecular level, they account for most of the 
physical and other observable differences between individuals. 

2 0 It has been estimated that approximately 14 million sequence 

polymorphism differences exist between individuals. 

The ability of one strand of DNA to attach or hybridize 
to a complementary strand has already been exploited for 
several purposes. For example, small pieces of DNA (15 to 2 5 
25 base pairs long) can be made which will hybridize to longer 

strands of DNA which have a complementary sequence. These 
short "primers" can be selected such that they hybridize to a 
specific, unique location on the longer strand. Once the 
primers have hybridized to their target on the DNA, the 

3 0 polymerase chain reaction (PCR) can be employed to generate 

millions cf copies of (or amplify) the particular segment of 
DNA between the locations to which two primers are bound. 
Briefly, this technique allows amplification of a DNA region 
situated between two convergent primers, using oligonucleotide 
3 5 primers that hybridize to opposite strands. Primer extension 

proceeds inward across the region between the two primers, and 
the product of DNA synthesis cf one primer serves as a 
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for the ether primer. Repealed cycles cf DNA 
denatursticn, annealing of primers, and extension result in an 
exponential increase in the number cf copies cf the region 
bounded by the primers. 
5 Similarly, a labeled segment cf single-stranded DNA can 

be hybridized to a longer DNA sequence, such as a chromosome, 
to nark a specific location on the longer sequence. Segments 
of DNA 50 bases long or longer that hybridize to a unique DNA 
location in the human genome are extremely unlikely to 
10 hybridize elsewhere in the human genome. 

The Human Genome Project is an effort to sequence all 
human DNA (the human genome) . The human genome is estimated 
to comprise 50,000 - 100,000 genes, up to 30,000 cf which 
might be expressed in the brain (Sutciiffe, Ann. Rev. 
15 Neurosci. 11:157 (19S8)). Once dedicated human chromosome 

sequencing begins in three to five years, it was expected that 
12-15 years will be required to complete the sequence cf the 
genome (Report of the Ad Hoc Program Advisory Committee on 
Complex Genomes, Reston, Va . , Feb. 19S8, D. Baltimore Ed. 
20 ( NI H, Dethesda, Md , 19S8)). At that rate, the majority of 

human genes would remain unknown fcr at least the next decade. 
The present invention can greatly accelerate the pace at which 
human genes can be identified and mapped. Most gene 
researcners, m conjunction with publication cf their results 
ln -~ ls rield, subm.it sequence data to the GenBank database. 
Prior to the present invention, GenBank listed the sequences 
or on.y a few thousand human genes and less than two hundred 
nur.an cram mRXAs (GenBank Release 66.0, December, 1990;. 

The rele cf sequencing complementary DNA ( c DNA ) , reverse 
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• * * + Thus, some have argued that 

— 9fUl ^Z:ZZ l ced "e ov. genome sequencing 
^"BTr^ — (».«>• -ever, until 

reoulatory sequences, will be missed Y 

(R eport of the Committee on » , res3 , 
Genome, National Research Council (National X ibed 

H ashington, D .C. 19..))- "^^^ ~n 

regions o f the genome using ^ ries o£ cDNA 

"hsidered ^practical or un a ^ petitive ele _ ts , 

were believed to oe other nuclear 

mitochondrial genes, ribosomal =*A genes , 

genes comprising common or —^-f^J^ C sequences 

^ +->,^ cDNA libraries would provide rew * h 
oelieved that c«X ^ regula tory polypeptides or 

-responding to st„ ^ ^ ^ 

LL muscle cDNA library and identi.ied clones or^3 = th 
19 x„o„m muscle polypeptides, including one new isotype 

unknown coding Qf cDNA sequenc ing was that 

Another perceived drawback cel iular 
some mRHAs are abundant, and some are rare 

quantities o f — ^ c r^ics tTbeT^e that most 

rr^Lrirrd . w ^ — - — - 

and useless. despite such 

The .resent invention demonstrates that, ^ 

ph _ceutical industries. Not only can ^M-^ ^ 

- — ^ £ r!^J^^l « isolate entire 

with conventional , well un ^ iiU 
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genes, and to determine the chromosomal locations and 
biological functions of these genes. As is demonstrated here, 
fragments^ of only a few hundred bases are sufficient, in many 
cases, to identify the probable function of a new human gene 
if it is similar in structure to a gene from another animal, 
or from plants or bacteria. Similarly, even fragments of 
untranslated regions of a cDNA can be used to: i) isolate the 
coding sequence of the cDNA; ii) isolate the complete gene; 
ill) determine the position of the gene on a human chromosome, 
and hence the potential of the gene to cause a human genetic 
disease; and iv) determine the function of the gene by means 
of experiments in which the function of the native gene is 
disrupted by the addition of a short DNA fragment to the cell, 
e.g., using triple helix or antisense probes. 

Because coding regions comprise such a small portion of 
the human genome, identification and mapping of transcribed 
regions and coding regions of chromosomes is of significant 
interest. There is a corresponding need for reagents for 
identifying and marking coding regions and transcribed regions 
cf chromosomes. Furthermore , such human sequences are valuable 
fcr chromosome mapping, human identification, identification 
cf tissue type and origin, forensic identification, and 
locating disease-associated genes (i.e., genes that are 
associated with an inherited human disease, whether through 
nutation, deletion, or faulty gene expression; cn the 
chrcmosome . 

5UMKARY OF THE INVENTION 

Ccntrary to the expectations cf the scientific community, 
c2NA screening and sequencing techniques have new been used to 
discover a large number cf heretofore unknown human genes. 
Iisolosed herein are over 2 C :■ new human polynucleotide 
sequences. The novelty cf these sequences has been 
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c the present invention were ascertained 
The ° f ^^^^atton. This approach 

using a fast approach to ciia expr essed human genes 

— facilitate ^ ^^^V^T cc«t of collate 
within a few years ** *£? G fflarkers , provide new 

aenomic sequencing, provide new , 
Leased therapeutics and diagnostics, and prov.de other 

valuable nucleotide reagents. „ pmience 
The sequences disclosed herein, styled Expressed Sequence 
„ s f n EST ^.) are markers for human genes actually 
I: sJiwV :,o. Technics are disclosed for using these 

1 The use of ESTs, complete coding sequences, or 

gene. The use cnroItl osomes , for mapping 

fragments thereof for marking individ ual or 

locations of expressed genes on chromosomes, for xndx 
forensic identification, for mapping logons of diseas 
associated genes, for identification of tissue ^' ^J is 
preparation of antisense sequences, probes, and constructs 
prepa rati . detail below . Unlike the random genomic DNA 
discussed m detail below Scie nce 245:1434 

sequence tagged sites (STSs) (Olson et al., Scxen 
nq8911 ests point directly to expressed genes. 

'-ILL, .U. « - ^ -:rt:dTI P ™™ A 

^dividual BSTs, ^°^J^ £ p r h enrobes. - 

genomic DNA, mRNA, antisense strands, tr P 

iLession vectors and polypeptide expression products, they 
a^ also within the scope o f the present invention, along with 
antibodies, especially monoclonal antxbod.es, to 
expression products. 

^pj^v nF^r^TPTION OF THE DRAWING 
The single drawing Figure schematically iU-trat.s the 
progression from chromosome to gene to mRNA to cDNA. 
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The derailed description that follows provides net only 
the actual sequence of each new EST, but also explains hew the 
ESTs were obtained, how to obtain the corresponding complete 
cDNA sequence and the corresponding genomic DNA sequence, how 
tc make DNA constructs from the ESTs and corresponding 
sequences, how to use those sequences as reagents in molecular 
biology ana other fields, how to produce gene products from 
the ESTs and corresponding sequences and antibodies to those 
gene products, and the functional categories of many ESTs and 
corresponding genes. Furthermore, numerous actual working 
examples and predictive examples are provided to demonstrate 
and exemplify numerous aspects of the invention. 

I - ESTs from cDNA Libraries 

The sequences of the present invention were isolated from 
commercially available and custom made c DNA libraries using a 
rapid screening and sequencing technique. In general, the 
method comprises applying conventional automated DNA 
sequencing technology to screening clones, advantageously 
ranccmiy selected clones, from a cDNA library. Preferably, 
the library is initially "enriched" through removal of 
ribescmai sequences and other common sequences prior tc clone 
selection. According to the present method, ESTs are 
generated from partial DNA sequencing of the selected clones, 
-he iSTs cf the present invention were generated using low 
redundancy of sequencing, typically a single sequencing 
reaction. While single sequencing reactions may have an 
accuracy as lew as 97%, this nevertheless provides sufficient 
~- a =--~y r ~ r --er.tif icaticn cf the sequence and desicn cf PC?. 



printers . 

-~st human genes can be identified bv EST secuer.c-nc f 

-irraries of c DNA copies cf messenger RNAs . However, some 
genes are expressed only at specific times durir.a embrvor.io 
development, :r only m small amounts in a few specific cell 
types. Cther genes nave rrPJv-.s that are decraded verv ruir'-lv 
---- te__ m vr.ion tney are expressed. If ar.v of these ore 
the case. transcripts cf the gene will net re represented in 
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cDNA libraries so the gene will not be identifiable by EST 
sequencing. A new method called "exon amplification", 
however, can be used to isolate and identify transcripts of 
such genes . 

5 Exon amplification works by artificially expressing part 

or all of a gene that is contained in a cloned fragment of 
genomic DNA such as a cosmid or yeast artificial chromosome 
(YAC) . The gene is cloned into a special vector, designed at 
MIT , that uses control elements from virus genes to express 

10 the protein-coding exons of the human gene of interest. Exon 

trapping shows considerable promise as a general technique for 
identifying those genes in the human genome that cannot be 
found by cDNA cloning and EST sequencing. Exon amplification 
will also be useful for identifying the genes in regions of 

15 genomic DNA to which disease genes have been mapped. The exon 

amplification method can be used directly with the cosmid and 
YAC clones from human chromosomes that are being obtained by 
both NIH and DOE supported human genome centers. 

ESTs comprise DNA sequences corresponding to a portion of 

20 nuclear encoded messenger RNA. An EST is of sufficient length 

to permit: (1) amplification of the specific sequence from a 
cDNA library, e.g., by polymerase chain reaction (PCR) ; (2) 
use of a synthetic polynucleotide corresponding to a partial 
or complete sequence of the EST as a hybridization probe of a 

25 cDNA library, generally having 30 - 50 base pairs; or (3) 

unique designation of the pure cDNA clone from which the EST 
was derived (the EST clone) for use as a hybridization probe 
of a cDNA library. Preferably, EST-derived primer pairs and 
sequences amplify or detectably hybridize to a sequence from 

3 0 a genomic library. 

It has been found that sufficient information is 
contained in the 150-400 base ESTs from one sequencing run to 
■ effect preliminary identification and exact chromosome 
mapping. Accordingly, the ESTs disclosed herein are generally 

35 at least 150 base pairs in length. The length of an EST is 

determined by the quality cf sequencing data and the length of 
the cloned cDNA. Raw data from the automated sequencers is 
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eoited to remove low quality se^aence at the end of the 
sequencing run. High quality sequences (usually a result of 
sequencing templates without excessive salt contamination) 
generally give about 400 bp cf reliable sequence data; other 
sequences give fewer bases of reliable data. A 150 bp EST is 
long enough to be translated into a 50 ar.ino acid peptide 
sequence. This length is sufficient to observe similarities 
when they exist in a database search. Furthermore, 150 bp is 
long enough to design PCR primers from each end of the 
sequence to amplify the complete EST, Sequences shorter than 
150 bp are difficult to purify and use following PCR 
amplification. Furthermore, a 150 bp polynucleotide is likely 
to give a very strong signal with low background in a screen 
cf a genomic library. 

Finally, it is highly unlikely that a sequence of the 
same 150 bp exists in any genes in the genome besides the one 



oe- - 



y the EST. Some closely related gene family member- 



's 



nave very similar nucleotide sequences, but no examples cf 
pairs cf human genes with long segments cf identical sequence 
have been reported tc date. For instance, there are three 
known 5- tubulin genes in humans. Several ESTs were found that 
matchea one cr another of these tubulin genes, but several new 
mergers of this gene family were also found and could be 
cear^y distinguished from, the three known members . ESTs that 
match perrectly to several different genes can be detected by 
nyoricicmg to chromosomes: if many chromosomal loci are 
reserved, the sequence (or a close variant) is present in more 
^n^n one gene. This problem can be circumvented by using the 
2 * -untranslated part cf the cDN'A alone as a probe for the 
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regions, as detailed below- In this manner, previously 
unknown genes can be identified. 

While a variety of cDNA libraries can be used to obtain 
ESTs, human brain cDNA libraries are exemplified and represent 
5 a preferred embodiment. Suitable cDNA libraries can be 

freshly prepared or obtained commercially, e.g., as shown in 
Examples 1 and 9. The cDNA libraries from the desired tissue 
are preferably preprocessed by conventional techniques to 
reduce repeated sequencing of high and intermediate abundance. 

10 clones and to maximize the chances of finding rare messages 

from specific cell populations. Preferably, preprocessing 
includes the use of defined composition prescreening probes, 
e.g. , cDNA corresponding to mitochondria, abundant sequences, 
ribosomes , actins , myelin basic polypeptides , or any other 

15 known high abundance peptide; these prescreening probes used 

for preprocessing are generally derived from known ESTs . 
Other useful preprocessing techniques include subtraction, 
which preferentially reduces the population of certain 
sequences in the library (e.g., see A. Swaroop et al. , Nucl. 

20 Acids Res. 19:1954 (1991)), and normalization, which results 

in all sequences being represented in approximately equal 
proportions in the library (Patanjali et al , Proc. Natl. Acad. 
Sci. USA 88:1943 (1991)). 

The cDNA libraries used in the present method will 

25 ideally use directional cloning methods so that either the 5 T 

end of the cDNA (likely to contain coding sequence) or the 3 ! 
end (likely to be a non-coding sequence) can be selectively 
obtained. 

Libraries of cDNA can also be generated from recombinant 
3 0 expression of genomic DNA. After they are amplified, ESTs can 

be obtained and sequenced, e.g., as illustrated in Example 9. 

The sequences of the present invention include the 
specific sequences set forth in the Sequence Listing and 
designated SEQ ID NO: 1 - SEQ ID NO: 315. In one aspect of 
3 5 this embodiment/ the invention relates to those sequences of 

SEQ ID NCS : 1-315 that comprise the cDNA coding sequences 
for polypeptides having less than 95% identity with known 
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amino acid sequences (see Table 2) and more preferably less 
than 90% or 35% identity. In a second aspect, the invention 
relates to those sequences of SEQ ID NOS : 1 - 315 that encode 
polypeptides having no similarity to known amino acid 
sequences (see Examples that follow) . Precisely because they 
do not contain coding regions and are therefore more unique in 
their sequence structures, those sequences which meet neither 
of the preceding criteria can be most useful and are generally 
preferred for mapping. 

Consistent with the NIH mission and its responsibilities 



disseminate knowledge and share the tangible fruits c 



5 T t- 



its 



research, the present inventors have taken a number of steps 
to facilitate sequence data and clone availability. All EST 
sequences have been submitted to GenBank. The corresponding 
cDNA clones have been submitted to the American Type Culture 
Collection and information on clones and sequences has been 
submitted to the Genome Data Base (Pearson, P. Nucl. Acids 
Res. 19 (Suppl.): 2237-9 (1991)). 

11 • Complete Coding- Sequences from ESTs 

The ESTs of the present invention aenerallv represent 



relatively small coding regions or untranslated region 
numan genes. Although most of these sequences do not code 



s 



highly specific markers for the corresponding ccr.olete codinc 
regions. The ESTs are of sufficient length that they will 
ny-ndize, under stringent conditions, only with ON A for that 
gene t z which they correspond. Suitably stringent conditions 
comprise conditions, for example, where at least ?5i, 
preferably at least 97% cr 9S% identitv (base tairmo , is 

sequence. Therefore, onlv routine labcratcrv work is 
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probe cDNA and purify them by known purification method: 

5. Nucleotide sequence the ends cf the newly purified 
clones to identify full length sequences. 

6. Perform complete sequencing of full length clones 
by Exonuclease III digestion cr primer walking. Northern 
blots cf the mRNA from various tissues using at least part cf 
the EST clone as a probe can optionally be performed to check 
the size of the mRNA against that of the purported full iencrth 
cDNA. 

An EST is a specific tag for a messenger RNA molecule. 
The complete sequence of that messenger RNA, in the form cf 
cDNA, can be determined using the EST as a probe to identify 
a cDNA clone corresponding to a full-length transcript, 
followed by sequencing of that clone. The EST cr the full- 
length cDNA clone can also be used as a probe to identify a 
genomic clone cr clones that contain the complete gene 
including regulatory and promoter regions, exons , and introns . 

ESTs are used as probes to identify the cDNA clones from 
wnich an EST was derived. ESTs , or portions thereof, can be 
nick-translated cr end-labelled with 3Z ? using polynucleotide 
kinase and labelling methods known to those with skill in the 
art (Basic Methods in Molecular Biology, L.G. Davis, M.D. 
Dinner, ana J . F . Battey , ed., Elsevier Press, NY , 1956) . The 
-.arnica library can be directly screened with the labelled ESTs 
cr interest or the library can be converted en masse to 
pBluescript (Stratagene, La Jclla, California) to facilitate 
-a^.erid^ colony screening. Both methods are well known in 
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compared to duplicate plates of colonies or plagues; each 
exposed spot corresponds to a positive colony or plaque. The 
colonies or plaques are selected, expanded and the DNA is 
isolated from the colonies for further analysis and 
5 sequencing. 

The ESTs can additionally be used to screen Northern 
blots of mRNA obtained from various tissues or cell cultures, 
including the tissue of origin of the EST clone. Northern 
analysis will most often produce one to several positive 

10 bands. The bands can be selected for further study based on 

the predicted size of the mRNA. 

Positive cDNA clones in phage lambda are analyzed to 
determine the amount of additional sequence they contain using 
PCR with one primer from the EST and the other primer from the 

15 vector. Clones with a larger vector-insert PCR product than 

the original EST clone are analyzed by restriction digestion 
and DNA sequencing to determine whether they contain an insert 
of the same size or similar as the mRNA size on a Northern 
blot. 

2 0 Once one cr more overlapping cDNA clones are identified, 

the complete sequence of the clones can be determined. The 
preferred method -is to use exonuclease III digestion 
(McCombie, W.R, Kirkness, E-, Fleming, J.T., Kerlavage, A.R. , 
lovannisci, D.K., and Martin-Gallardo , R. , Methods: 3: 33-40, 
25 1991) . A- series of deletion clones is generated, each of 

which is sequenced. The resulting overlapping sequences are 
assembled into a single contiguous sequence of high redundancy 
(usually three to five overlapping sequences at each 
nucleotide position) , resulting in a highly accurate final 

3 0 sequence. 

A similar screening and clone selection approach can be 
applied to obtaining cosmid or lambda clones from a genomic 
DNA library that contains the complete gene from which the EST 
was derived (Kirkness, E.F., Kusiak, J.V7., Menninger, J., 
35 Gocayne, J.D., Ward, D.C., and Verier, J.C. Genomics 10: 985- 

9S5 (1991). Although the process is much more laborious, 
these genomic clones can also be sequenced in their entirety. 
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A shotgun approach is preferred to sequencing clones with 
inserts longer than 10 kb (genomic cosmid and lambda clones) . 

In shotgun sequencing, the clone is randomly broken into 
many small pieces, each of which is partially sequenced. The 
sequence fragments are then aligned to produce the final 
contiguous seqaence with high redundancy. An intermediate 
approach is to sequence just the promoter region and the 
intron-exon boundaries and to estimate the size of the introns 
by restriction endonuclease digestion (ibid.). 

Using the sequence information provided herein, the 
polynucleotides of the present invention can be derived from 
natural sources or synthesized using known methods. The 
sequences falling within the scope of the present invention 
are not limited to the specific sequences described, but 
include human allelic and species variations thereof and 
portions thereof of at least 15-15 bases. (Sequences cf at 
least 15-13 bases can be used, for example, as PCR primers or 
as DNA probes.) In addition, the invention includes the 
entire coding sequence associated with the specific 
polynucleotide sequence of bases described in the Sequence 
Listing, as well as portions of the entire coding sequence cf 
at ..east 15-15 bases and allelic and species variations 
tnereof. Furthermore, to accommodate coder, variability, the 
invention includes sequences coding for the same ammo acid 
sequences as do the specific sequences disclosed herein. 
Finally, although the error rate in the automated sequencing 
used in the present invention is small, there remains seme 
cnance cf error. Therefore, claims tc particular sequences 



secure net be s: narrowly construed as tc require inclusion cf 
erroneously identified oases or to exclude corrections. 

Any sreoifio seqier.ee disolosed herein cam be readilv 
soreened for errors by resequencmg eacn EST in both 
cireotions (i.e. , sequence both strands of cDNW 

The sequences . constructs, vectors, clones. and otner 
materials comprising one present invention can advantageouslv 
ce m enrioned or isolated form. As used herein. ,: ennored n 
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2, 5, 10, 100, or 1000 times its natural concentration (for 
example) , advantageously 0.01%, by weight, preferably at least 
about 0.1% by weight. Enriched preparations of about 0.5%, 
1%, 5%, 10%, and 20% by weight are also contemplated. 
Further, removal of clones corresponding to ribosomal RNA and 
"housekeeping" genes and clones without human cDNA inserts 
results in a library that is "enriched" in the desired clones. 

The term "isolated" requires that the material be removed 
from its original environment (e.g., the natural environment 
if it is naturally occurring) . For example, a naturally- 
occurring polynucleotide present in a living animal is not 
isolated, but the same polynucleotide, separated from some or 
all of the coexisting materials in the natural system, is 
isolated. 

It is also advantageous that the sequences be in purified 
form. The term "purified" does not require absolute purity; 
rather, it is intended as a relative definition. Individual 
EST clones isolated from a cDNA library have been 
conventionally purified to electrophoretic homogeneity. The 
sequences obtained from these clones could not be obtained 
directly either from the library or from total human DNA. The 
cDNA clones are not naturally occurring as such, but rather 
are obtained via manipulation of a partially purified 
naturally occurring substance (messenger RNA) . The conversion 
of mRNA into a cDNA library involves the creation of a 
synthetic substance (cDNA) and pure individual cDNA clones can 
be isolated from the synthetic library by clonal selection. 
Thus, creating a cDNA library from messenger RNA and 
subsequently isolating individual clones from that library 
results in an approximately 10 6 -fold purification of the 
native message. Purification of starting material or natural 
material to at least one order of magnitude, preferably two or 
three orders, and more preferably four or five orders of 
magnitude is expressly contemplated. 

in a cDNA library there are many species of mRNA 
reureser.ted. Each cDNA clone can be interesting in its own 
right, but must be isolated from the library before further 
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experi-entation can be ccnpleted. In order to sequence any 
specific cDNA, it sust be renoved and separated (i.e. isolated 
and purified) from all the other sequences. This can be 
accomplished by many techniques known to those of skill in the 
These procedures normally involve identification of a 
bacterial colony containing the cDNA of interest and further 
amplification of that bacteria. Once a cDNA is separated from 
the mixed clone library, it can be used as a template for 
further procedures such as nucleotide sequencing. 

Although claims to large numbers of ESTs and 
corresponding sequences are presented herein, the invention is 
not limited tc these particular groupings of sequences . Thus, 
individual sequences are considered as applicants' discoveries 
or inventions, as are subgroupings of sequences. All of the 
functional subgroupings set forth in the tables define 
groupings for which separate claims are contemplated as being 
vitnm the scope of this invention. Moreover, in addition to 
claims to individual clones, it is intended that the present 
disclosure also support claims tc numerical subgroupings. 
Thus, subgroupings of 50 ESTs (and corresponding sequences) 
are contemplated (e.g., SEQ ID NCS 1-50, 51-100, 101-150, 
etc.) as being within the scope of this invention, as are 
subgroupings of 5, 10, 25, 100, 200, and 3 00 ESTs and 
— "^spending sequences . 

Ill . DNA Constructs 

The present invention also includes recombinant 
instructs comprising one or more of the sequences as broadly 

instructs comprise a vector, such as a 
vector, into which a sequence of the 
invention nas been inserted, m a sense or antisense 
orientation. In a preferred aspect of this embodiment, the 

emulate ry sentences f i n o 1 u d i n c 
e o to the s e oo e n o e 



described above. The c 
p-asmid or viral vp~~~- ~- ~ , .-^ ^ 



struct further ccmcrise^ 



*-xamp_£, a promoter, orerablv li~>^- -~ ~~~~ 



-to or s \ 



a . . c promoters are kn ; 
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Baeterialt pBs , phagescript, <JX174, pBluescript SK, pBs KS , 
pNH8a, pNH16a, pNH18a, pNH46a (Stratagene) ; pTrc99A, pKK223-3, 
pKK233-3, pDR540, pRIT5 (Pharmacia). 

Eulcarvotic; pWLneo, pSV2cat, pOG44, pXTl, pSG (Stratagene); 
pSVK3 , pBPV, pMSG, pSVL (Pharmacia). 

Promoter regions can be selected from any desired gene 
using CAT (chloramphenicol transferase) vectors or other 
vectors with selectable markers. Two appropriate vectors are 
pKK232-8 and pCM7 . Particular named bacterial promoters 
include lad, lacZ, T3 , T7 , gpt, lambda P R , and trc. 
Eukaryotic promoters include CMV immediate early, HSV 
thymidine kinase, early and late SV4 0, LTRs from retrovirus, 
and mouse metallothionein-I . Selection of the appropriate 
vector and promoter is well within the level of ordinary skill 
in the art. 

In a further embodiment, the present invention relates to 
host cells containing the above-described construct. The host 
cell can be a higher eukaryotic cell, such as a mammalian 
cell, or a lower eukaryotic cell, such as a yeast cell, or the 

0 host cell can be a procaryotic cell, such as a bacterial cell. 

Introduction of the construct into the host cell can be 
effected by calcium phosphate transf ection, DEAE dextran 
mediated transf ection , or electroporation (Davis, L. , Dibner, 
M. , Battey, I., Basic Methods in Molecular Biology, (1986)). 

5 The constructs in host cells can be used in a 

conventional manner to produce the gene product coded by the 
recombinant sequence. Alternatively, the encoded polypeptide 
can be synthetically produced by conventional peptide 
synthes i zers . 

0 Certain ESTs have already been preliminarily categorized 

by analogy to related sequences in other organisms (see Table 
2) . Table 10 of Example 8 categorizes particular ESTs broadly 
as metabolic, regulatory, and structural sequences where 
known. Constructs comprising genes or coding sequences 

5 corresponding to each of these categories are, therefore, 

specifically and individually contemplated. 
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Table 11 more particularly separates 27 new ESTs into 11 
categories using a different criteria. These are genes 
related to ceil surface; developmental control; energy 
metabolism; kinase and phosphatase; oncogenes; peptidases and 
peptidase inhibitors; receptors; structural and cytoskeietai ; 
signal transduction; transcription, translation, and 
subcellular localization; and transcription factors. Table 11 
further identifies the EST by the particular gene product for 
which it apparently codes. Each of these categories 

individually comprises a preferred category of EST , and 
preferred constructs and resulting polypeptide can be prepared 
from tncse ESTs or the corresponding complete gene sequence. 

IV - ESTs and Corre sponding Sequences as Reagents 

Each of the cDKA sequences identified herein (and the 
corresponding complete gene sequences) can be used in numerous 
ways as polynucleotide reagents. The sequences can be used as 
diagnostic probes for the presence of a specific mFLNA in a 
particular cell type. In addition, these sequences can be 
used as diagnostic probes suitable for use in genetic linkage 
analysis (polymorphisms). Further, the sequences can be used 
as probes for locating gene regions associated with genetic 
disease, as explained in more detail below. 

The EST and complete gene sequences of the rresent 
invention are also valuable for chromosome identification. 
Each sequence is specifically targeted to and can hybri 



particular location on an individual human chrem: 



::::c- 
s o m. e . 



'ecver, there is a current need for identifying particula: 



sites on the cnromcsome. Few chromosome marking reagents 

rasec cn actual sequence data (repeat polymorphisms' are 
presently available for marking onromosomal location. Tne 
present invention constitutes a major expansion of avail able- 
cm rem c s cm. e markers. 

Tung tne techniques described m Example : or 4 ESTs 
-no tnear corresponding complete sequences can re r.arrei to 
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. those sequences with g-es — ia « d 
in correlating those seq 

disease. d to chromosomes by 

Briefly, sequences can ^ ^ £STS . 

preparing PC* primers ^^J- I rapidly select , 
computer analysis of the B ^ ^ ^ geMB1(; 

primers that do not t .^^"^ io . tloB process. These . 
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lab eled flow-sorted chromosomes, c dnA libraries. 

hyb ridisation to construct ^"J^ts a re listed in 
Results of mapping ESTs to chromosomal segm 

Tab les 3 and 4. hybridiza tion (FISH) of a cDNA clone 

Fluorescence in situ ny ^ fco Frovlde a 

to a metaphase chromosomal sprea ^ technique can 

precise chromosomal location in o ^ ^ hMever , 
be used with CDNA as short as li)ce lihood of 

clones larger than 2,000 ^ haV Jcatiln wlth sufficient 
hinding to a unique ^"""^^ FISH requires use of 
si gnal intensity for ^ ived , - ^ 

the clone from which the EST w ^ nore than 

better. 2,000 bp is good, 4, ^ ^ results a 
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Basic Techniques; Pergamcn Press, Nev York (19S6). 

Reagents for chromosome mapping can be used individually 
(to mark a single chromosome or a single sire on that 
chromosome) or as panels of reagents (for marking multiple 
sites and/or multiple chromosomes) . Reagents corresponding to 
noncoding regions of the genes actually are preferred for 
mapping purposes. Coding sequences are more likely to be 
conserved within gene families, thus increasing the chance of 
cross hybridizations during chromosomal mapping (see Tables S 
and 9) . 

Once a sequence has been mapped to a precise chromosomal 
location, the physical position of the sequence on the 
chromosome can be correlated with genetic map data. (Such 
data are found, for example, in V. McKusick, Mendelian 
Inheritance in Man (available on-line through Johns Hopkins 
University Welch Medical Library) . The relationship between 
genes and diseases that have been mapped to the same 
chromosomal region are then identified through linkage 
analysis ( coinher itance of physically adjacent genes) . 

Next, it is necessary to determine the differences in the 
eDN T A cr genomic sequence between affected and unaffected 
individuals . if a mutation is observed in seme cr all of the 
affected individuals but not in any normal individuals, then 
tne mutation is likely to be the causative agent cf the 
disease . 

With current resolution cf physical mapping and genetic 
mapping techniques, a cDNA precisely localized to a 



■tocal region associated with the disease could be one o: 



oetweer. rO and 500 potential causative genes. (Tnis assumes 
- r.eoaoase mapping resolution and one gene per 20 kb . } 

Comparison of affected and unaffected individuals 
generally involves first looking for structural alterations m 
tne cnromosor.es, such as deletions or translocations that are 
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polymorphisms . 

In addition to the foregoing, the sequences of the 
invention, as broadly described, can be used to control gene 
expression through triple helix formation or antisense DNA or 
5 RNA, both of which methods are based on binding of a 

polynucleotide sequence to DNA or RNA. Polynucleotides 
suitable for use in these methods are usually 2 0 to 4 0 bases 
in length and are designed to be complementary to a region of 
the gene involved in transcription (triple helix - see Lree et 

10 al, Nucl. Acids Res. 6: 3073 (1979); Cooney et al , Science 

241: 456 (1988); and Dervan et al, Science 251: 1360 (1991)) 
or to the mRNA itself (antisense - Okano, J. Neurochem. 56: 
560 (1991) ; Oligodeoxynucleotides as Antisense Inhibitors of 
Gene Expression, CRC Press, Boca Raton, FL (1988)). Triple 

15 helix formation optimally results in a shut-off of RNA 

transcription from DNA, while antisense RNA hybridization 
blocks translation of an mRNA molecule into polypeptide. Both 
techniques have been demonstrated to be efficient in model 
systems. Information contained in the sequences of the 

20 present invention is necessary for the design of an antisense 

or triple helix oligonucleotide . 

The present invention is also a useful tool in gene 
therapy, which requires isolation of the disease-associated 
gene in question as a prerequisite to the insertion of a 

25 normal gene into an organism to correct a genetic defect. The 

high specificity of the cDNA probes according to this 
invention have promise of targeting such gene locations in a 
highly accurate manner. 

The sequences of the present invention, as broadly 

30 defined, are also useful for identification of individuals 

from minute biological samples. The United States military, 
for example, is considering the use of restriction fragment 
length polymorphism (RFLP) for identification of its 
personnel. In this technique, an individual's genomic DNA is 

35 digested with one or mere restriction enzymes , and probed on 

a Southern blot to yield unique bands for identifying 
personnel. This method does not suffer from the current 
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limitations of "Dog Tags" which car. be lost, switched, or 
stolen, making positive identification difficult. The 
sequences of the present invention are useful as additional 
DNA r.arkers for RFLP. 

However, RFLP is a pattern based technique, which does 
net directly focus on the actual DNA sequence of the 
individual. The sequences of the present invention can be used 
to provide an alternative technique that determines the actual 
base-by-base DNA sequence of selected portions of an 
individual's genome. These sequences can be used to prepare 
FCR primers for amplifying and isolating such selected DNA. 
one can, for example, take an EST of the invention and prepare 
two ?CR primers from the 5' and 3 ■ ends of the EST. These are 
used to amplify an individual's DNA, corresponding to the EST. 
The amplified DNA is sequenced. 

Panels of corresponding DNA sequences from individuals, 
r.ade this way, can provide unique individual identifications, 
as each individual will have a unique set of such DNA 
sequences, due to allelic differences. The sequences cf the 
present invention can be used to particular advantage to 
obtain such identification sequences from individuals and from 
tissue, as explained m Examples 10 - 12. The EST sequences 

from Example 1 and the complete sequences from Example n 
uniquely represent portions of the human genome. Allelic 
variation occurs tc some degree in the coding regions 



">ese 



sequences, and to 



a 



greater degree in the nencoding reoions 



ls estimated that allelic variation between individual 
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^=..-1 = 1.2 individuals. : r o rcrrrdirg sequences of Tarle 



WO 93/00353 




PCT/US92/05222 



-26- 

which each yield a noncoding amplified sequence of 100 bp. If 
predicted coding sequences, such as those from Table 6, are 
used , a more appropriate number of primers for positive 
individual identification would be 500-2,000. 
5 If a panel of reagents from ESTs or complete sequences of 

this invention is used to generate a unique ID database for an 
individual, those same reagents can later be used to identify 
tissue from that individual. Positive identification of that 
individual, living or dead can be made from extremely small 

10 tissue samples. 

Another use for DNA-based identification techniques is in 
forensic biology. PCR technology can be used to amplify DNA 
sequences taken from very small biological samples such as 
tissues, e.g., hair or skin, or body fluids, e.g., blood, 

15 saliva, semen, etc. In one prior art technique, gene 

sequences are amplified at specific loci known to contain a 
large number of allelic variations, for example the DQa class 
II HLA gene (Erlich, H. , PCR Technology, Freeman and Co. 
(1992)) . Once this specific area of the genome is amplified, 

20 it is digested with one or more restriction enzymes to yield 

an identifying set of bands on a Southern blot probed with DNA 
corresponding to the DQa class II HLA gene. 

The sequences of the present invention can be used to 
provide polynucleotide reagents specifically targeted to 

2 5 additional loci in the human genome , and can enhance the 

reliability of DNA-based forensic identifications. Those 
sequences targeted to noncoding regions (see, e.g., Tables 8 
and 9) are particularly appropriate. As mentioned above, 
actual base sequence information can be used for 

3 0 identification as an accurate alternative to patterns formed 

by restriction enzyme generated fragments. Reagents for 
obtaining such sequence information are within the scope of 
the present invention. Such reagents can comprise complete 
ESTs or corresponding coding regions, or fragments of either 
3 5 of at least 15 bp , preferably at least 18 bp. 

There is also a need for reagents capable of identifying 
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the source cf a particular tissue. Such need arises, for 
example, in forensics when presented with tissue cf unknown 
origin. Appropriate reagents can comprise, for example, DNA 
probes or primers specific to particular tissue prepared from 
tne ESTs or complete sequences of the present invention. 
Panels cf such reagents can identify tissue by species and/or 
by organ type. In a similar fashion, these reagents can be 
used to screen tissue culture for contamination. 

v - Production of Polypeptide Corresponding to ESTs 

As previously explained, each EST corresponds not only to 
a coding region, but also to a polypeptide. Once the coding 
sequence is known, or the gene is cloned which encodes the 
polypeptide, conventional techniques in molecular biology can 
be used to obtain the polypeptide. 

At the simplest level, the amino acid sequence encoded by 
the polynucleotide sequence can be synthesized using 
commercially available peptide synthesizers. This is 

particularly useful in producing small peptides and fragments 
of larger polypeptides. (Fragments are useful, for example, 
m generating antibodies against the native polypeptide.) 

Alternatively, the DNA encoding the desired polypeptide 
can re inserted into a host organism, and expressed. The 
organism can be a bacterium. , yeast, cell line, or 
multicellular plant cr animal. The literature is replete with 
t= x a_p _ e s cf suitarie nest organisms and expression techniques. 
Fcr example, naked polynucleotide (DNA or rmRNA) can be 
injected directly into muscle tissue cf mammals, where it is 
expressed. This methodology can be used to deliver the 

- r a ^ns t a r o re i gn pc^y peptide ,'Wcif : , et al . , Science 247:1465 
' Feigner, et al. , Nature 349:251 'l?rl" 
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peptide. Such ---^^;;-r -at polypeptide. 
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VI. Exameles inven tion are described in 

Certain aspects or rne pj. follow, 
greater detail in the non-living Examples that 
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EXAMPLE 1 

cdna Sequences Determined by Random 
Clone Selection: First set 

METHODOLOGY: 

ref erence to the data presented in Table 1, lambda 
ZAP libraries were converted en masse to pBluescnpt piasmids, 
transfected into E . coli XLl-3iue cells, ana plated on X- 

10 cal/IPTG/ampicillin plates. A total cf 1053 clones were 

picked at random from three human brain cDNA libraries: fetal 
tram, two-year-old hippocampus, and two-year-old temporal 
cortex (Stratagene catalog #936206, 936205, 935, respectively. 
Stratagene, 11099 N . Tcrrey Pines Pd . , La Joila, CA 92037). 

^ 5 An ana ^ysis of these clones is summarized in Table I 



see 



;e - ow ) -~ addition, clones seie: 



:ne Hippocampus 

library were also analyzed after subtractive hybridization 
witn the fibroblast library . These results are listed in the 
"Hippocampus Subtracted" column cf Table 1. Templates for DNA 
sequencing were ?CR products or plasmids prepared by the 
alkaline lysis method. About half of the templates prepared 
-y PGR failed to yield an amplified fragment suitable fcr 
sequencing. This was primarily due to use of PCP conditions 
tnat minimized the need fcr further purification cf the 
product but aisc selected against amplification of loner 



tne pBluescript plasmid, 7.o ..M each dh'TP, and 0.1 Jb\ 
:er fcr 25 cycles: 94 C C, 40 sec; 55 C C, 40 sec; 72 = C, 

tier. artnacts. Ciagenf; columns improved the 
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After a cycle sequencing protocol, carried out in a Perkin- 
Eimer thermal cycler, sequencing reactions were run on an 
Applied Biosystems, Inc. (Foster City, CA) 373A automated DNA 
sequencer. (Cycle sequencing was performed in a Perkin Elmer 
5 Thermal Cycler for 15 cycles of 95 e C, 30 sec; 60°C, 1 sec; 

70°C, 60 sec and 15 cycles of 95 C C, 30 sec; 70°C, 60 sec with 
the Applied Biosystems, Inc. Taq Dye Primer Cycle Sequencing 
Core Kit protocol) . Some sequencing reactions were performed 
on an ABI robotic workstation (Cathcart, Nature 347: 310 
10 (1990) hereby incorporated by reference) . 

RESULTS : 

Singe-run DNA sequence data were obtained from 609 
randomly chosen cDNA clones. The number of clones sequenced 

15 from each library is summarized in Table 1. Double-stranded 

cDNA clones in the pBluescript vector were sequenced by a 
cycle sequencing protocol with dye-labeled primers and Applied 
Biosystems, Inc. 373A DNA Sequences. The average length of 
usable sequence was 397 bases with a standard deviation of 

20 99 bases. 

Subtractive hybridization has been used successfully to 
reduce the population of highly represented sequences in a 
cDNA library by selectively removing sequences shared by 
another library. (Schmid and Girou, Neurochem. 48: 307 

25 (1987); Fargnoli et al, Anal. Biochem. 187: 364 (1990); Duguid 

and Dinauer, Nucl. Acids. Res. 18: 2789 (1990); Schweinfest, 
et al, Genet. Anal. Techn. Appl. 7: 64 (1990); Travis and 
Sutcliffe, Proc. Natl. Acad. Sci . USA 85: 1696 (1988) ; Kato, 
Eur. J. Neurosci. 2: 704 (1990)). Subtractive hybridization 

3 0 was therefore tested as a way of enhancing the number of 

brain-specific clones in the hippocampus library by 
hybridising the hippocampus library with a WI3 8 human lung 
fibroblast cell line cDNA library and removing the common 
sequences (Schweinfest et al , Genet. Anal. Techn. Appl. 7: 64 

35 (1990); Sive and St. John, Nucl. Acids Res. 16 : 10937 (1988)). 

Clones from this subtraction are listed in the column 
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n 



ippccar.pus Subtracted" in Table 1. 

The EST sequences fror, this Example I are identified as 
SZQ ID NOs 1-3 15 . 
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EXAMPLE 2 
EST Characterization: First Set 

ESTs including SEQ ID NOs 1-315 were analyzed as follows. 
Initially, the EST sequences were examined for similarities in 
the GenBank nucleic acid database (GenBank Release 65.0), 
Protein Information Resource Release 26.0 (FIR), and ProSite 
(KacPattern from the EM3L data library-, Fuchs R. Comput. Appl. 
Biosci. 7: 105 (1990) Release 5.0 were used). BLAST was used 
to search Genbank and the PIR (both maintained by the National 
Center for 3i otechnc logy Information) ESTs without exact 
GenBank matches were translated in ail six reading frames and 
each translation was compared with the protein seouence 
database PIR and the ProSite protein motif database. 
Comparisons with the ProSite motif database were cone by means 
of the program KacPattern from the EMBL Data Library. GenBank 
and PIR searches were conducted with the "basic local- 
alignment search tool" programs for nucleotide (ELASTN) and 
peptide ( B LASIX ) comparisons (Altschul et al , J . Hoi. Biol. 
215: 403 (1990)). PIR searches were run cm the National 
Center for Biotechnology Information BLAST network service. 
Tne ^^AST programs contain a very rapid database- searching 
algorithm that searches fcr local areas of similarity between 
two sequences and then extends the allotments cn the basis of 



other database-searching programs such as F A.ST A (Pe car sen 
Lipman, Proc. Natl. Acad. Sci. USA , 85: 2444 (19£H;;. 
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were missed due to the use of BLASTN for the database search. 
No additional statistically significant matches were found. 
Statistical significance does not necessarily mean functional 
similarity; some of the reported matches may indicate the 
5 presence of a conserved domain or motif or simply a common 

protein structure pattern. Those ESTs identified as fully 
corresponding to known human genes or proteins are not 
included in this disclosure. Statistically significant 
matches are reported in Table 2 , together with the length and 

10 percent identity or similarity of each alignment. 

On the basis of database searches, 609 EST sequences were 
classified into eight groups as shown in Table 1 (see Example 
1 above) . Four groups, with 197 or 3 2% of the sequences, 
consist of matches to human sequences: repetitive elements, 

15 mitochondrial genes, ribosomal RNA genes, and other nuclear 

genes. Forty-eight (8%) of the sequences matched non-human 
entries in GenBank or PIR while 230 (38%) had no significant 
matches. The remaining 134 (22%) sequences contained no 
insert or consisted entirely of polyA between the EcoRI 

20 cloning sites. 

Thirty-six ESTs matched previously sequenced human 
nuclear genes with more than 97% identity. Four of these ESTs 
are from genes encoding enzymes involved in maintaining 
metabolic energy, including ADP/ATP translocase, aldolase C, 

25 hexokinase, and phosphoglycerate kinase. Human homologs of 

genes for the bovine mitochondrial ATP synthase F 0 B-subunit 
and porcine aconitase were also found (Table 2) - Brain- 
specific cDNAs included synaptophysin , glial fibrillary acidic 
protein (GFAP) , and neurofilament light chain. At least six 

3 0 ESTs are from genes encoding proteins involved in signal 

transduction: 2 ' , 3 1 -cyclic nucleotide 3 ' -phosphodiesterase (2 
ESTs), calmodulin, c-erbA-a-2 , G £ a, and Na + /K* ATPase a-subunit. 
Other ESTs were matches to genes for ubiquitous structural 
proteins — actins, tubulins , and fodrin ( non-erythroid 

3 5 spectrin) . ESTs also document the presence in the hippocampus 

cDNA library of the ret prcto-oncogene , the ras-reiated gene 
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rhoB, and one cf the chromosome 22 breakpoint cluster region 
transcripts. Eight ESTs are from genes known to be associated 
with generic disorders (Online Mendel ian Inheritance in Man ) 
More than half of the human-matched ESTs from Zxar.pie 1 have 
been napped to chromosomes, indicating the bias of Gen3ank 
entries toward well-studied genes and proteins. 

ESTs without significant GenBank matches were also 
compared to the PrcSite database cf recognized protein motifs. 
Not counting post-translational -modification signatures , 
flfr y~ fcur sequences contained motifs from the database. Some 
patterns, particularly the "leucine zipper", are found in 
scores or hundreds cf proteins that do not share the 
functional property implied by the presence of the motif. 

Similarities to sequences from other organisms were also 
detected in the 5L-AST searches of GenBank and ?IR (Table 2) . 
Several ESTs displayed similarity to "housekeeping" genes, 
including the ribosomal proteins S10 and L3 0 (rat) and the 
above glycolytic enzymes. EST0G257 (SEQ ID NO: 77) shows 
strong nucleotide sequence similarity to the squid (67%) and 
Ircsophila (70.4%) kinesm heavy chain. Kinesin was first 
aescribed as a mi crctubul e - a s soc i a ted motor orotein 



: a n r " ' 

Cell 42 : 2 9 



:rganelle transport in the squid giant axon (Vale et al , 



\-55d); . Six oncogene-rel ated sequences were also 
:NA clones sequenced. EST CC 29 9 (SEQ ID NC:1SC; and 
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10 



15 



20 



25 



30 



genes Notch and Enhancer of split. Nucleotide and peptide 
alignments of EST00256 (SEQ ID NO: 188) and EST00259 (SEQ ID 
NO: 2 27) with the Drosophila genes have been demonstrated. Both 
genes are part of a signal cascade encoded by the "neurogenic" 
genes that are involved in the differentiation of neuronal and 
epidermal cell lineages in the neuroectoderm of the developing 
Drosophila embryo (Campos-Ortega, Trends in Neuro. Sci. 11: 
400 (1988)) . It has been proposed that the Enhancer of split 
protein interacts with a membrane protein that is the product 
of the Notch gene to convert a developmental signal into an 
altered pattern of gene expression (id. J. Hoi. Biol. 215: 403 
(1990)). EST00256 (SEQ ID NO: 188) matches near the 5' end of 
the Enhancer of split coding sequence, away from the mammalian 
G protein 3 subunit- and yeast cdc4-like elements (Hartley et 
al, Cell 55: 785 (1988); Klambt et al . EMBO J . 8: 203 (1989)). 
Part of the EST00259 (SEQ ID NO: 227) match to Notch in the 
cdcl0/SWl6 region that is similar to three cell-cycle control 
genes in yeast and is tightly conserved in the Xenopus Notch 
homolog, Xotch. In Drosophila, Enhancer of split is 
absolutely required for formation of epidermal tissue. Notch 
contains several epidermal growth factor-like repeats and 
appears to play a general role in cell-cell communication 
during development (Banerjee and Zipursky, Neuron 4:177 
(1990) ) . 

Seven genes were represented by more than one EST. 
Comparisons of all the ESTs against one another revealed two 
overlaps of unknown ESTs: EST00233 (SEQ ID NO: 32) and 
EST00234 (SEQ ID NO : 8 ) match in opposite orientations and 
EST00235 (SEQ ID NO:204) and EST00236 (SEQ ID NO:148) match in 
the same orientation beginning at the same nucleotide. Five 
human genes were represented by more than one EST: 3-actin 
(3), x-actin (2), a-tubulin (2), Q -2 -macroglobulin (2), and 
2 i 3 i -cydic-nucleotide-3 '-phosphodiesterase (2). Those few 
instances where two or more ESTs represent different portions 
of a single cDNA can be readily ascertained when the sequence 
of the full cDNA insert is determined in accordance with 
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SEQ ID EST# Putative Identification 



TabLe 2: ESTs Identified by Database Hatches 



Accession 


DB 


Len 


XID 




A2S2D9 


PIR 


i rip 

1 uo 


jd. 


7 


A35544 


PI R 


l UD 


on 


5 


S10021 


DID 
r IK 


44 


51 - 


1 


HUMACTAR 


GB 


271 


85. 


3 


RATAGR 


GB 


269 


82. 


2 


CADNSHUMAH 


SP 


41 


45. 


2 


A30047 


PIR 


86 


58. 


6 


BOVMTASB 


GB 


293 


85. 


4 


A35075 


PIR 


57 


86. 


2 


A35075 


PIR 


62 


47. 


6 


A31959 


PIR 


53 


46. 


3 


BOVMARCKS 


GB 


139 


83. 


6 


S05955 


PIR 


38 


92. 


3 


A33645 


PIR 


30 


54. 


8 


A35844 


PIR 


74 


85. 


3 


S03968 


PIR 


96 


58 


B 


PIGPREP 


GB 


223 


83 


.9 


LRPSHOUSE 


SP 


62 


44 


.4 


A36352 


PIR 


72 


75 


.3 


BOVSHGGDS 


GB 


131 


89 


.4 


S 10493 


PIR 


51 


46 


.1 


BOVBGSRH 


GB 


195 


79 


.6 


R6RT30 


PIR 


57 


96 


.5 


R3RT10 


PIR 


66 


97 


.0 


A35652 


PIR 


97 


77 


.5 


TVHUDB 


PIR 


25 


65 


.4 


A35104 


PIR 


33 


67 


.6 


HUHOM 


GB 


228 


99 


.6 


S06551 


PIR 


25 


57 


.7 





EST00250 


60K fi lariat antigen 


97 


EST00289 


Aconi tase 


251 


EST00370 


Actin, other 


248 


EST00271 


Actinin, alpha 


132 


EST00110 


Agr i n 


13 


EST00255 


Cadherins 


188 


ESTD0256 


Enhancer of split 


310 


EST00377 


Fo ATPase beta subunit, mitochondrial 


77 


EST00257 


Kinesin 


78 


E5T00258 


Kines in 


313 


EST00276 


Lysosomal membrane glycoprotein 1 (LAMP-1) 


161 


EST00247 


KARCXS (myristoylated alanine-rich protein kinase 


43 


EST00371 


Maternal G10 protein 


223 


EST00368 


Hi crotubule-associ ated protein 1B 


227 


EST00259 


Notch/Xotch 


93 


EST00287 


Processing enhancing protein 


9 


EST00376 


Prolyl endopeptidase 


202 


EST00298 


Protein-tyrosine phosphatase LRP 


38 


EST00374 


RNA polymerase II 6th subunit CRP026) 


37 


EST00038 


ras p21-like small GTP-binding protein (srr>g GDS) 


180 


EST00299 


ras-related proteins 


102 


E5T00248 


rho H12/ ARH12 


301 


EST00300 


Ribosomal protein L30 


22 


EST00301 


Ribosomal protein S10 


299 


EST00249 


smg p25A GDP dissociation inhibitor 


300 


EST00232 


Transforming protein (dbl) 


189 


EST00282 


trkB 


187 


EST00152 


Wilm's tiror- related protein 


249 


EST00275 


Zinc Finger Proteins 
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s ~ messer.ge: 



There is little redundancy in EST sequencing according 
t: the ? rese " invention. Of the nuciear-encode: 
R-NAs, the cost cciro.cn ESTs were to the 5-actin 
(approximately 0.6% of the EST clones) ana myelin basic 
rotein genes (M3P, approximately 0.5% of the clones). K3P , 
a highly expressed structural component of nerve tissue 
(Kamholtz, J., de Ferra, F . , Puckett, c, & Lazzarini, R. 
Proc. Natl. Acad. Sci., USA 83: 4962-4966 (1986)), displays 
four alternate splicing forr.s, cf which it is believed at 
least two are present among the ESTs reported here. Other 
common ESTs were Gs-alpha gamma-act in and both a- and alcha- 
tubulin. 

By matching ESTs to known database sequences, a 
phenctypic characterization cf the tissue begins to emerge. 
Protein superf amiiies matched by ESTs were grouped into 
three broad functional categories to assess the biological 
spectrum represented by these randomly selected cDNA clones. 
Structural and metabolic classes comprised about 30% cf the 
ESTs with database matches. Twenty-five percent were 
involved in regulatory pathways and the remainder were not 
classifiable. In addition, it is believed that several genes 
net previously known tc be expressed in the brain were 
matched, including spermine/spermidine acetyltrans f erase 
(uaserc, P., Celanc, ?, Ervin, S., Appiegren, N . , wiest, L. 
£ --egg, A . J. Biol. Chea. 266: 0IC-S14 (1991)) and 
cstecpcr.tir. (Ycur.g, v., Kerr, J., Te-ine, J., Wewer , "., 
Wang, v., v c5ridef w _ - Fisher - ( Genomics 7:4 91-502 

(15 9 0) ) . 



EXAMPLE 3 

Mapping cf ES Ts to Human C hromoso mes 
Randomly selectee ZSTs ccrresr encme tc Seru en 



dentif icat. 



= were assigned tc chrcmcscmes v 



= ce.able~ - _ . 

- • ~ _ _ r p airs were cesirr.ee 

: _ . — _ 

— — - — ~ — — tt . . _ tr 3 — - - - ~, ^_ _ — — . — „ ,c _. 
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through an intron. The oligonucleotides were 18-2 3 bp in 
length and designed for PCR amplification using the computer 
program INTRON (National Institutes of Mental Health, 
Bethesda, MD) . The program is based on the assumptions 
5 that: 1) introns are genomic sequences that interrupt the 

coding and noncoding sequences of genes (Smith, J. Mol. 
Evol. 27:45-55 (1988)); 2) there are consensus sequences 
for splice junctions (Shapiro, et al . , Nucl. Acids Res. 
15:7155-7174 (1987)); and 3) that 90% of the human genes 

10 studied -have 3' untranslated regions of mRNA not interrupted 

by introns in the genomic DNA (Hawkins, Nucl. Acids Res. 
16:9893-9908 (1988) ) . 

The program evaluates the likelihood that a given GG or 
CC dinucleotide represents a former exon-intron boundary. 

15 Specifically, every input strand is processed by the INTRON 

program twice, first evaluating the sense mRNA strand, and 
then processing the complementary or anti-sense strand- The 
program evaluates each sequence by finding all GG or CC 
pairs (possible former splice sites) , searching for STOP 

20 codons in all three reading frames, and analyzing the GG or 

CC pairs surrounded by stop codons. All regions of the EST 
that are unlikely to contain splice junctions based on CC 
content, GG content, and stop codon frequency are then 
marked by the program in uppercase. 

25 The creation of PCR primers from known sequences is 

well known to those with skill in the art. For a review of 
PCR technology see Erlich, H.A., PCR Technology? Principles 
and Applications for DNA Amplification , 1992; W.H. Freeman 
and Co. , New York. ESTs were examined for the presence of 

30 stop codons in each reading frame and for consensus splice 

junctions. The presence of stop codons and absence of 
splice junction sequences are mere characteristic of 3 ' 
untranslated sequences than of introns. The untranslated 
sequences are unique to a given gene; thus, primers from 

35 these regions are less likely to prime other members of a 

gene family cr pseudogenes. 

The primers were used in polymerase chain reactions 
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(?CR) to amplify templates fror, total hu-an generic DNA . 
PCR conditions were as follows: 60 ng of generic DNA was 
used as a template for PCR with SO ng of each 
oligonucleotide primer, 0.6 unit of Tag polymerase, and 1 
uCu of a 3£1 ?-iabeled deoxycytidine triphosphate. The PCR 
was performed in a microplate thennocycler (Techne) under 
the following conditions: 30 cycles of 94 C C, 1.4 min ; 55 C C, 
2 min; and 72 C C, 2 min; with a final extension at 72 C C for 
10 min. The amplified products were analyzed on a 6% 
pciyacrylair.ide sequencing gel and visualized by 
autoradiography. If the size cf the resulting product was 
equivalent to the EST from which the primers are derived, 
tnen the PCR reaction was repeated with DNA templates from 
two panels cf human-rodent somatic cell hybrids; BIOS 
PCRabie DNA (BIOS Corporation) and NIGMS Human-Rodent 
Somatic Ceil Hybrid Mapping Panel Number 1 (NIGMS, Camden, 

n~ ; . 

PCR was used to screen a series of somatic ceil hvtrid 
cell lines containing defined sets cf human chrcmosomes for 
the presence of a given EST. DNA was isolated from the 
somatic hybrids and used as starting templates for PCR 
reactions using the primer pairs from EST sequences selecte 
acove. Only those somatic cell hybrids with chromosomes 
containing the human gene corresponding to the EST will 
yield an amplified fragment. ESTs were assigned to a 
enrcmcssme by analysis of the segregation pattern of PCR 
predicts from hybrid DNA templates. Eor a review of 
techniques and analysis cf results from somatic cell aene 
tapping experiments. (See Ledbetter et al . , Genomics 6:4~5 
4E1 U^:).) The single human chromosome present m all 



: e _ _ 



vends that give rise to an amplified fra~ 
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Table 3: Assignment: of ESTs to Chromosomes by PCR 

PRIMER *2 



9FO TD 


EST# 


Chr 


PRIMER #1 


5 


r ST00012 




TCCAGGCAATCCCAGAATAG 


57 


EST00058 




CTGTTTGCAAGTTTCAAAGC 


OH 


F^T00066 

kp> X V/ w W W w 




GCCATTGTGCTGAATAGAGT 


R*} 


F^T00079 


1 


CAGCTAATTGACCTGGGCTA 


R^ 


F^T00079 


1 


G G C AG AG C AT AAT GAG TAT A 


PI 

~ X 


J X W W U U U 


x 


AGTTTAGATGGAGGGCTGTC 






1 


CTTAATCACCTCCCTTTTGT 


i op 

XU ~ 


F^T0009 5 


1 


AGTCTAATCCTGTACACTTG 


ilu 


zl* o xuu x w w 


1 


TTAGAAGTGCCCATGGGAGG 


14 X 


xUO iUUllo 




P7CAGAGAAACTTAGGTGAA 


9 9 Pi 
ZZU 


rcTnn ^7 9 


1 

X 


AAGTTGCACATTGCCCAAGG 


9^7 
Z J / 


F9T001 R7 


1 


TTACAAATTTCTCTTGACGC 


9 A 9 


ycTnm Q 9 
.Di iUU 15 Z 


1 

X 


G G A T C A G AT AAT C AAA C AG G 


z j y 


t cTnn 9 n 9 


1 


GTATGAC AGTTTAACTGAGG 


/by 




X 


CTGTTGCTGTGCAGTAGCTT 


o o o 

zy y 


tub IUU Zm- y 


X 


P ATPATGPAGACGTAGATAT 


lb 


tub xUUUZx 


z 


PAPnrA AGTTTCT^CCAGGA 


Q 

o 


tub iuU/jH 


o 
z 


TAG A AGGCAAACTATGTCCC 


j> b 


tub 1UUU JJ / 


z 


APPGAGA AGGGTGCTTAAAG 


xZ3 


tub i UU lUb 


Z 


PTPTA ATTTGTAACCTTCAG 


1 O 9 

i.y z 


xud i uu l j j 


9 
z 


GATTTATGTCTGGGAACTAA 


zuu 


tUO X UU X O Z 


9 
z 


TTT A ATGGGTGGTGGGAGCT 


TO/. 


LUD x U U Z X O 


9 
z 


GGTAAGAATTCGTTTGGCTC 


XU Z 


X W w Z ^ O 




ATACTACATCTAGTCTGG 


1 67 


lUUiJ o 


3 


AAAC AG CTG CGG AGTACA 


1 9 
x 


F^TOO? 74 


3 


C CTAG C AAACTC A.TAC AC AC 


60 


-ST0006 2 


3 


ACACATTAACGGTGCTGCAG 


77 


EST00257 


3 


AAGCTCACAACGCAGATCTG 


107 


EST00093 


3 


ATTGAACTCTGTCAACAGTG 


108 


FST00094 


3 


AL2 - GCAGGATGTCAGTCTTTTGAG 


37 


EST00038 


4 


AACTTCGCAGTCATGAGAAC 




EST00013 


4 


CACATGTTCTCCCTCTTTCA 


37 


EST00038 


4 


AL2 - GGAAGTACAGGATTTGGC 


31 


EST00033 


5 


TGGGTACCCTAAGGTGTTTG 


28 


EST00030 


5 


AGATAAGTTAGGAAGCTGGT 


59 


EST00061 


5 


AAAGTTTCTT AG C AC CCCCC 


74 


EST00073 


5 


ATCAGACACGTGGCAGGGTT 


121 


EST00104 


5 


T G AAG GC AG CTGCTAAATCT 


14 9 


EST00123 


5 


ATACTGTCAACGGAGGGTGA 


235 


EST00185 


5 


TT A C T G T C C C AT C AG AT AT C 


23 


EST00026 


5 


C CT G C AG T G A C AC TT AA C AT 


121 


EST00104 


5 


AL2 - CAGATCAATACATCCTCTGGG 


1 


EST00007 


6 


TAGTTGATGGTCTGGGTTAT 


19 


EST00023 


6 


CAACTTACATTAGGGGTTTG 


155 


EST00129 


6 


GG AAGCTGCCATATAAGCTC 


224 


EST00356 


6 


GCTGTATGTTAACCCTTTGT 


28S 


EST00219 


6 


ACTTTCATGTTGAGAAGTAT 


22 


EST003C1 


6 


CTCCGTGATTACCTTCATCT 


207 


EST00167 


7 


GGTGCTACTTTGTGAATGCT 


13 7 


ESTC0272 


~7 

/ 


AG T G G T C A CT AT CT A C A i G G 


292 


EST0C223 


8 


TGCAGCAGTGACCATGAG.AA 



CTAATTGAGCTCACTGGCCC 

GCCATTTCTAACAACCAGAG 

GTTAGTGTTTCCTTAGCAAG 

CAACATGCTCTGAGCTTTAG 

C AT ATG CAT AT GGTCCC TAT 

TCTGCCCTAATGCGCAGGCT 

CCTTAGTTGGAGATAAGGTC 

CGGGCTTTCTCTGAATTGGT 

TTTTAAGGCTCTGGAGTGTT 

C T AC AG AAT C ATTT C AC C AG 

ATAGTACTGCAAGGTTATTC 

CTGAAGGAGCACAGTTTCTC 

G C TTA G G ATAT G AATG C ATA 

CTACATATTTGTGCCTCCTT 

CTTTTGACCCAGTGAAACTT 

CCAACTCCTGCCAGATCATT 

TCAGACCCATGGTCAGCTT 

GGTTGAGGATTGGCTTTTAC 

GCAGTGAACCAGTACTCCTA 

G ATAG ATTGTATAAG AAG C C 

GCAGCATGTGAAAGAATGAT 

CGATGCACATCCTTCTCCAT 

GTCTGGCACATAATAGATTTG 

TTACAGTTCTGTGGTTTC 

AAAGGATCCTCCACTCCAGA 

CATAAGTGAATGGACACAGG 

GGAATCAGCCCTTGAGGACT 

CTGGAACAGCTTACAAAGGT 

TGTAAAACAAAGGCCAAACT 

AG C AC AC ATTATC TACC ACG G C 

TGTATCGGGCAGTTCTCAG 

GCATTTTGGAGCTCTTCCGT 

TTAGAGATGGGATGATGCCG 

GACTAATCTAAGGTCTAGG 

ACT CACTG CTAG TAT CAT CC 

CAGACTTTGACAAAAGAATC 

AAGTCCCTGAGGGTGCAGAA 

GGATGTATTGATCTGACTCA 

GTCTGCAGGTTTCTCCTTGA 

TACACTCTTAAGAAGGTATG 

CTGCTCACCTGAAATTGATAC 

CTGTGCAGTGGTGAGTAAAAGG 

G AAAT C C C AG G G AG A C AAT G 

GAC CT C ATTAG AAGAG C C C A 

TCAGTGTCGTACAATCTACC 

TGGAACCCTCAAACACTGCT 

ATCTAGCTGAAACATTGCTG 

TTGTAGGTATCTCTGTCAGCT 

A G C AAT G T G AT TTT GT AG G 

G ATT C AG AATTACTAAG C CG 

ATCATCTTTCCACGCGGCTT 
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??.:mer 



13* 


EST00375 


9 


20 


ESTOOCG^ 


10 


157 


EST0C121 


■< r- 


17 2 


EST00142 


- ;~\ 
J. 


250 


ESTO0197 


10 


123 


ESTCG111 




178 


EST0029* 


11 


10 


EST00C16 


11 


126 


EST00109 


11 


7 


EST00C14 


12 


254 


EST00200 


i - 


170 


EST00295 


±. ^* 


255 


EST002C1 




290 




^ 4- 


293 


EST00224 


1^ 


215 


ESTOOOOS 




95 


EST000S5 


*■ =; 


2C5 


E S T 0 C 1 A " 






ESTC3C3- 




247 


EST0C27 9 




IS 


EST00373 


- £ 


6S 


EST0OC6S 




£^ 


ESTOOCfC 


1 9 


225 








rc mA > ^ - = 




2 1 0 


E5T0C16E 




15 6 


EST0C115 






E 5 T 0 C 13 2 










16 2 






b r e v 







., C T GGGCTTCTGTGGTTCAA 

A.G C T G TT C C T G A G A G AT G C A 
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The foregoing techniques have been used to further localize 
6 ESTs and their associated genes to precise locations onto 
chromosome 6 or chromosome X, as reflected in Table 4 (xn 
Example 5 below) , using sublocalization techniques that employ 
somatic cell hybrids. ESTs were used as hybridization probes 
and mapped to other chromosomes using techniques disclosed xn 
Example 5. Somatic cell hybrids were prepared that contained 
defined subsets of chromosomes 6 and X. Methods for preparxng 
and selecting somatic cell hybrids are known in the art. For a 
review of an exemplary procedure to generate somatic cell 
hybrids containing the short arm of human chromosome 6, see 
Zoghbi, et al., Genomics 9<4):713-720 (1991). For a general 
review of somatic cell hybridization see Ledbetter et 
al (supra) . The hybrids were processed to obtain DNA ana 
analyzed by PGR and by fluorescence in situ hybridizatxon. 
ID NOs 19, 22, 1, 224, 288 mapped to chromosome 6, whxle SEQ ID 
NO 162 mapped to chromosome X using somatic cell hybrids. 

EXAMPLE 4 

v^pp^r y of All EF T* to Human Chromosomes 

The procedure of Example 3 is repeated for all of the ESTs 
from Example 1 not previously mapped to human chromosomes. Data 
are generated corresponding to the data in Table 3 for all of 
the unmapped ESTs. As previously mentioned, virtually all of 
the ESTs will map to a unique chromosomal location. The 
inability of any ESTs to localize to a unique location will be 
readily ascertainable during the mapping process. 

EXAMPLE 5 

ntPrnative T^.^miB for Mapping to Chromosomes 
«, rr jT rr of fsts to ^™n QBO mes using jfl ynrf^pnce xn si tu 

h ybridization 

, ^ _ -r crrn x- ~ nprricul^r location 

This technique is usea to map cm r,S^ l.o a 

_ , n _ -^t-^c rncrcvse or whole blood can 
on a aiven chromosome. Ce^l caluures, u^ssue, 
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be used to obtain chromosomes . 

0.5 ml. cf whole blood is added to RPMI 1640 and incubated 96 
hours in a 5%CC 2 /37 C C incubator. 0.05 ug/ml colcemide is added to 
the culture one hour before harvest. Cells are collected and 
washed in ?3S . The suspension is incubated with a hypotonic 
solution cf KCi added drcpwise to reach a final volume of 5 ml . 
The cells are spun down and fixed by resuspending the cells in 
methanol and glacial acetic acid (3:1). The ceil suspension is 
dropped onto glass slides and dried. 

The slides are then treated with RNase A and washed then 
dehydrated in a series of increasing concentrations of ethanol. 

The EST to be localized is nick-translated using 
f lucrescently labeled nucleotide (Korenberg, Jr., et ai . , Ceil 
53 (3 ) :391-400 (1988)). Following nick translation , unincorporated 
label is removed by spin dialysis through Sepharose. The prcbe is 
further extracted with phenol-chloroform to remove additional 
protein. The chromosom.es are denatured in fcrmamide usmc 
techniques known in the art and the denatured probe added to the 
slices. Following hybridization, the cells are washed. The 
slides are studied under a fluorescent microscope. In addition, 
the chromosomes can be stained for G-banding or Q-banding using 
techniques known in the art. 

The resulting metapnase chromosomes have fluorescent tags 
localized to those regions cf the chromosome that are homologous 
to the EST. Tnus , a particular EST is localized to a particular 
region on a given chromosome. For a review of the technique, see 
verm.a et al . , Human Chromosomes: A Manual of Basic Techniques. 
Fercanon ~ ^ ~ v : i g z ~ 1 - > ~- : ^ v, . ~ ■ . _ 
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Table 4: Precise 



Chromosomal Localization of ESTs 



SEQ ID EST# Map Location 

19 EST00023 6p 

22 EST00301 6p 

1 EST00007 6q 

224 EST00356 6q 

288 EST00219 6q 

162 EST00133 Xpll-21 - Xp21.2 

EXAMPLE 6 
T^+^-t-.^d DNA gp ^enciTia Accuracy 

ESTs that match human sequences in GenBank are 
excellent tools for the analysis of the accuracy of double- 
strand automated DNA sequencing. EST/GenBank matches from 
number of clones were examined for the number of nucleotide 
mi smatches and gaps required to achieve optimal 
t he Genetics Computer Group (GCG) program BESTFIT (Devereux 
et al, Kucleic Acids Research 12: 387 (1984)). The number 
of mismatches, insertions and deletions was counted for each 
hundred bases of the sequence (Table 5). As expected, the 
sequence quality was best closest to the primer and 
decreased rapidly after about 400 bases. The number of 
deletions and insertions relative to the GenBank reference 
sequence increased five- to ten-fold beyond 400 bases, while 
the number of mismatches doubled. The average accuracy rate 
for individual double-stranded sequencing runs was 97.7. to 
400 bases. 
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EXAMPLE 7 

Probability of ESTs Containing- Coding- Sequences 

5 

The ESTs of the present invention were statistically 
evaluated using the coding-region prediction program CRM via 
the GRAIL server (Uberbacher, E . & Mural, R. Proc. Natl. 
Acad. Sci. USA, 88: 11261-5 (1991)). The CRM program uses a 

10 neural network to combine results from several different 

coding regions by looking at different 6 bp sequences found 
in coding exons and in introns. The program additionally 
conducts reading frame searches and assesses randomness at 
the third position of codons . This protocol categorizes 

15 sequences as having an excellent, good, marginal, or poor 

probability of containing coding regions. The results are 
reported in Tables 6-9. There were 32 ESTs categorized as 
"excellent" (Table 6) ; 14 categorized as "good" (Table 7) ; 
13 categorized as "marginal" (Table 8) ; and 213 categorized 

20 as "poor" (Table 9) . These results indicate that most ESTs 

of the present invention comprise noncoding regions. 
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Table 7: ZSTs with Good Probability of Containing Coding Sequence 



SEO ID- 


EST- 


20 


EST00024 
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EST00071 


82 


EST00078 


88 
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EST00272 
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EST00328 
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EST00156 
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EST00204 
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EST00297 


296 


EST00228 
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Table 9: ESTs with Poor Coding Probability 
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EXAMPLE 8 

Functional Grou pings of ESTs and Corresponding Genes 

By matching new human ESTs to known sequences from other 
species, the apparent function of the gene corresponding to the EST 
can be ascertained. The data generated in Example 2 have been used 
to categorise 28 of the ESTs of the present: invention, and their 
corresponding genes, into predicted functional groups. (These 28 
are ESTs with database Hatches to sequences from other species for 
which a function was known.) Two different grouping schemes have 
been used . 

The first scheme separates the sequences into three broad 
categories: metabolic; regulatory; and structural. These grourinas 
are set out in Table 10. 

The second grouping scheme separates the sequences into 12 
specific categories: cell surface proteins; developmental control; 
energy metabolism; kinases and phosphatases; oncogenes; other 
metabolism-related polypeptides ; peptidases and peptidase 
inhibitors; receptors; structural and cytoskeletal ; signal 
transduction ; transporters; transcription, translation, and 
subcellular localisation; and transcription factors. These 
groupings are set out in Table 11. 
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Table 11: ThLrteen-Class Functional Groupings of ESTs 

Group Putative Identification 
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EXAMPLE 9 



cDNA Libraries Generated From Specific Genomic DNA 
by Exon Expression & Amplification 

5 

Exon amplification is used to express potential exons 
from genomic DNA in a recombinant vector that contains some of 
the signals necessary for splicing. If an exon is present in 
the proper orientation in the vector, that exon will be 

10 spliced in a mammalian cell and will become part of the mRNA 

of that cell. The exon splice-product can be purified from 
other mRNA in the cell by conversion of the mRNA to cDNA and 
selective amplification of the recombinant splice-product 
cDNAs. Cosmid DNA from human chromosome 19ql3.3 is digested 

15 with BamKI or BamHI/Bglll restriction enzymes. The fragments 

generated are collected and size specifically cloned into an 
expression vector (Buckler, et al . Proc. Nat'l. Acad. SciJ 
USA, 88:4005-4009 (1991)). After transfection by 

electroporation of these constructs into COS cells, RNA 

2 0 transcripts are generated using the SV4 0 early promoter and a 

polyadenylation signal derived from SV4 0 both present in the 
expression vector. When a fragment of genomic DNA contains an 
entire exon with flanking intron sequence in the sense 
orientation, the exon should be retained in the mature 
25 poly ( A) -r cytoplasmic RNA . Therefore, the mRNA is used as 

template for cDNA synthesis using reverse transcriptase and 
vector-priming. Subsequently, the cDNAs are amplified by 
vector-priming using PCR. A fraction of this first PCR 
product is reamplified using internal vector-primers 

3 0 containing terminal cloning sites. These products are end- 

repaired with T4 DNA polymerase , digested with the appropriate 
• restriction enzymes, gel purified and cloned into pBluescript 
vectors. The constructs are iransfected into XLl-31ue 
competent cells and piared on LB/X-cal/IPTG/ampiciliin plates. 
35 When multiple ccsmids or Y A Z clones are used as the source 

DNA , a pool of specific expressed exons is obtained as a cDNA 
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EXAMPLE 10 
PCR Ampl ification from Predicted Exons 

Computational analyses can be applied t: 
sequences to predict protein coding regions. The coding 
region prediction program CRM ( E . Uberbacher and R. Mural, 
Proc. Natl. Acad. Sci. USA 88 : 11261-5 (1991)) finds crer. 
reading frames and classifies them according to their 
probability of being coding regions. These regions are 
subsequently examined using the GM program (C. Fields and C. 
Soderiund, Comp. Applic. Biosci. 6: 262, 1990), which oredicts 
^^rcr.-excr. structure. PCR primers are then designed to 
amplify the predicted exons and used to test human cDNA 
-icranes (for example, fetal brain or placental libraries) 
for the presence cf these putative exons usina a PCR assav. 

This strategy has been successfully applied in twe large 
scale genomic sequencing projects, the Huntington's locus cf 
human chromosome 4pl6.3 (McCombie, et al., submitted) and 
human chromosome locus 19ql3.3 ( Mart in-Gal 1 ardo , et al . , 

EXAMPLE 11 
Complete Sequence of EST Clone Inserts 

Tr.ere are a number cf methods known to those with shill 
l ~" e irr cr molecular biology, to obtain sequence 

rrocecures for these methods are provided in Basic Methods in 
^^^^ sr — --y ^^vid et al . sucte: '; . One wav to accuire 
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(for deletions from the Forward primer and Reverse primer ends 
of the insert, respectively) . The Kpnl and PstI enzymes leave 
3 f sticky ends following digestion, which Exonuclease III is 
unable to bind. This results in unidirectional deletions into 
5 the cDNA insert leaving the vector sequence undisturbed. 

After addition of Exonuclease III to the Forward and Reverse 
deletion reactions, aliquots of the reaction were removed at 
defined time intervals and the reaction was stopped to prevent 
further deletion. SI nuclease and Klenow DNA polymerase were 

10 added to create blunt ended fragments suitable for ligation. 

Samples for each time point was purified by 
electrophoresis through an agarose gel and religated. Two to 
four representative clones from each time point in each 
direction were sequenced to give between 2 00 and 400 base 

15 pairs of sequence data. Careful selection of deletion 

conditions and time points allow a deletion series of 
approximately 100-200 base pairs difference in length at each 
consecutive time point. Sequence fragments were reassembled 
into a redundant contiguous sequence using the INHERIT 

20 software from Applied Biosystems, Inc. (Foster City, CA) . In 

this way, the complete insert from these four cDNA clones was 
sequenced on both strands to an average redundancy between 
three and four (each base was sequenced between three and four 
times, on average) . 

2 5 EXAMPLE 12 

Determining Reading- Frame, Orientation, Coding Regions: 
ESTs and Complete cDKA Sequences 

3 0 Once the complete cDNA sequence has been determined in 

accordance with Example 11, the reading frame, orientation, 
and coding regions are determined by computer techniques. 
(The complete coding region is considered to be the largest 
open reading frame from a methionine to a stop codon.) 
25 Specifically, the CRM program on the GRAIL server is used 

as explained in Example 7 to determine probable coding 
regions. This information is supplemented by location of 
start and stop codor.s. Where possible, the results of the CRM 
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analysis are validated by comparison of the cDNA sequence to 
known sequences using database matching, in accordance 



Example 2. If a match of 50% (or even less) is found in any 
particular reading frame and orientation, this serves to 
verify corresponding CRM results. Alternatively, database 
matches can be used to determine reading frame and orientation 
without use of the CRM program. Of course, if the cDNA is 
derived from a directional library, the probable orientation 
is already known. 



EXAKPLE 15 

Preparatio n of PCR Primers and Amplification of DNA 

The EST sequences and the corresponding cD.NA sequences 
and genomic sequences may be used, in accordance with the 
present invention, to prepare PCR primers for a variety of 
applications. The PCR primers are preferably at least 15 
bases, and more preferably at least 13 bases in length. The 
procedure of Example 3 is repeated using the desired EST , cr 
using the corresponding cDKA or genomic DNA sequence from 
example 11. it is preferred that the primer pairs have 
approximately the same G/C ratio, so that melting temperatures 
are approximately the same. When screening cDNA , introns are 
no concern; however, when screening genomic DNA , trim.ers 

across introns, which 
arge to amplify. The PCR primers and 
:is Examrle find use in t: 



— ^ ^ ~ _ s-r_e_ta ^ o a v o i a reaomc 
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EXAMPJLE 14 

Forensic Hatching by DNA Sequencing 

In one exemplary method, DNA samples are isolated from 
5 forensic specimens of, for example, hair, semen, blood or skin 

cells by conventional methods. A panel of PCR primers derived 
from a number of the sequences of Example 1, 9, 10 and/or 11 
is then utilized in accordance with Example 10 to obtain DNA 
of approximately 100-2 00 bases in length from the forensic 

10 specimen. Corresponding sequences are obtained from a 

suspect. Each of these identification DNAs is then sequenced, 
and a simple database comparison determines the differences, 
if any, between the sequences from the suspect and those from 
the sample- Statistically significant differences between the 

15 suspect's DNA sequences and those from the sample conclusively 

prove a lack of identity This lack of identity can be 
proven, for example, with only one sequence- Identity, on the 
other hand, should be demonstrated with a large number of 
sequences, all matching. Preferably, a minimum of 50 

20 statistically identical sequences of 100 bases in length are 

used to prove identity between the suspect and the sample. 

EXAMPLE 15 

2 5 Positive Identification by DNA Sequencing" 



The technique outlined in the previous example may also 
be used on a larger scale to provide a unique fingerprint-type 
identification of any individual. In this technique, primers 

3 0 are prepared from a large number of sequences from Examples 1, 

9, 10 and/or 11. Preferably, 20 to 50 different primers are 
used. These primers are used to obtain a corresponding number 
of PCR-generated DNA segments from the individual in question 
in accordance with Example 13. Each of these DNA segments is 

35 sequenced, using the methods set forth in Example 1. The 

database of sequences generated through this procedure 
uniquely identifies the individual from whom the sequences 
were obtained. The same panel of primers may then be used at 
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any later time to absolutely correlate tissue cr 
biological specimen with that individual. 



r.er 



EXAMPLE 16 
Southern Blot Forensic Identification 

The procedure cf Example 15 is repeated to obtain a panel 
of from 10 to 2C00 amplified sequences from an individual and 
a specimen. This PCR-generated DNA is then digested with one 
cr a combination of, preferably, four base specific 
restriction enzymes. Such enzymes are commercially available 
and known to those cf skill in the art. After digestion, the 
resultant gene fragments are size separated in multiple 
cuplicate wells cn an agarose gel and transferred tc 
nitrocellulose using Southern blotting techniques well known 
to those with skill in the art. Fcr a review cf Southern 
clotting see Davis et al . ( Basic Methods in Molecular Biology , 
198c, Elsevier Press, pp 62-65). 

A panel of ESTs cr complete cDNA sequences from, Zxamrles 



anc/cr 11 , cr fragments thereof cf at least 1 



radioactively cr cc 1 o r imetr ical iy labeled using end-labeled 
oligonucleotides derived from the ESTs, nick translated 
sequences cr the like using methods known in the art and 
hybridized to the Southern blot using techniques known in the 
art (Davis et al. , surra ) . Preferably, at least 5 to 11 of 
tnese 1 ace 1 e d probes are used, and mere treferabiv at least 
aoout 2C cr 2Z are used to provide a unioue oattern. The 
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B ot^lot_IdentificaU^^ 

aether technique for ^^jT^St" 
sequences disclosed herein utilizes 

technique. „ late d from nuclei of subject to be 

Genomic DBA X. of approx imately 30 bp in 

identified. Olxgonuol-t!-. ^ ^ s ^ eI>oes from the 

le ngth were synthesized that corr J? Bic Dm 

EEI s. ^^-^"tn the art. The 
through conditions Known ° ^ ^ polynucl eotide 

oligonucleotides ere end ^label ^ fcy spotting about 

Kinase (Pharmacia). Dot Bl sequ ences corresponding 

50 ng =DNA of preferably at le in Table 7 onto 

to a variety of the Sequence ID HO. pro man . fold 
ni trocellulose or the HKe using . ^^^^^^ filt er 
(BioKad, Richmond California) . ^ to 

containing the EST clone „ ith lab eled probe 

the filter, prehybridized and ^ al . ^ . The 

U sing techniques Known i» -the «^ hyb ridized with 

n ? labeled DNA fragments are * de tect minimal 

successively stringent conditions^^ ^ ^ ^ 
differences between the F ^ . dentifylng clones 

Tetramethylammonium chloride mismatches (Wood et 

containing small numbers of n .„„ , 198S) which 

.1., *ro=. Natl. SC1 - DSR " (1 A unique pattern of dots 

is hereby incorporated by reference^ A u *■ . 
distinguishes one individual from another 

EXAMPLE 18 
• 4-,. TdentificatioS-jrechniffiie 
^ternafckve-^^ 

. .. correE ponding complete cDKA 
EST se^ces ana tne c f ingerprint for an 

seg uences can *e used to ere. ces can be US ed xn 

individual. Thus pools o EST differen -ate one 

forensics, paternity suits or .h- 1- 
individual from another. 
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Entire EST sequences can be used; similarly 
oligonucleotides can be prepared from EST sequences. In this 
example, 20-mer oligonucleotides are prepared from 200 EST 
sequences using commercially available oligonucleotide 
services such as Oiigos Etc., Wilsonville, OR. Patient cell 
samples are processed for DNA using techniques well known to 
those with skill in the art. The nucleic acid is digested 
with restriction enzymes EcoRI and Xbal. Following digestion, 
samples are applied to wells for electrophoresis. The 
procedure, as known in the art, may be modified to accommodate 
polyacrylamide electrophoresis, however in this example, 
samples containing 5 ug of DNA are loaded into wells and 
separated on 0.S% agarose gels. The gels are transferred 
using Southern blotting techniques onto nitrocellulose. 

10 ng of each of the oligos are pooled and end-labeled 
witn 3 ^?. The nitrocellulose is prehybr idized with blocking 
solution and hybridized with the labeled probes. Following 
hybridization and washing, the nitrocellulose filter is 
exposed to X-Qmat AR X-ray film. The resulting hybridization 
? art6rn will be unique for each individual. 

It :ls additionally contemplated within this example that 
the representative number of EST sequences can be varied fcr 
aocitionai a c c u r a c v or claritv. 



EXAMPLE 19 

Identification of genes associated with hereditary diseases 



nis example illustrates an aorrcac 



USEIU. 



association of r.S_ sequences with particular ohenot\ 
characteristics. In this example, a particular EST is use; 
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of these genetic diseases. 

Cells from patients with these diseases are isolated and 
expanded in culture. PCR primers from the EST sequences are 
used to screen genomic DNA and RNA or cDNA from the patients. 
ESTs that are not amplified in the patients can be positively 
associated with a particular disease by further analysis. 



EXAMPLE 2 0 



10 



15 



20 



25 



30 



Tdentific^.ion of » crane associated with. 
zn gelman 1 s d isease 

Angelman's disease (AD) is characterized by deletions on 
the long arm of chromosome 15 (15qllql3) (Williams et al. Am. 
j Med. Genet. 32:339-345 (1989) hereby incorporated by 
reference) . The symptoms of the disease include developmental 
delay, seizures, inappropriate laughter and ataxic movements. 
These symptoms suggest that the disorder is a neurologic 
deficiency. This prophetic example illustrates how ESTs, 
preferably obtained from a cDNA library from human brain, may 
be used in identifying the defective gene or genes associated 
with Angelman's Disease. (The example is based on analogous 
work with genomic DNA, rather than cDNA and ESTs, in 
identifying the genetic defect associated with Angelman's 
Disease.) This example also illustrates how EST sequences may 
generally be used for identifying gene sequences associated 
with an inherited disease that is mapped to a chromosome 
location- 

ESTs are screened using techniques described in Example 
3 and Examole 5 to identify those ESTs that localize to the 
long arm of chromosome 15 and preferably localize to 
chromosome 15 bands 15qllql3 from normal patients. ESTs that 
bind to the long arm of chromosome 15 are hybridized to 
chromosome 15 from AD patients. These studies are preferrably 
pe-*ormed using either fluorescence in situ hybridization or 
using somatic cell hybrids that contain fragments from tne 
long arm of chromosome 15 from AD patients. Those chromosome 
15-soecific ESTs that do not map to chromosome 15 from AD 
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patients are useful as markers for Angei^an's Disease and can 



•se 



oe incorporated into diagnostics for genetic screening. T: 
ESTs are associated with chromosome deletions present in 
Angelman's disease. Identification of the gene associated 
with these AD negative ESTs and an analysis of the 
polypeptides encoded by the genes from normal patients is 
essential for providing gene or other therapies for AD 
patients . 

Genetic diseases are not always accompanied by gene 
deletions. Therefore, it is also important to use the ESTs 
that bind to bands 15qllql3 from AD patients as tools to 
identify the polymorphisms present within the disease 
population. Restriction fragment length polymorphism (RFLP) 
analysis can be performed on patient cells from AD disease or 
iron somatic cell hybrids created using the long arm, of 
cnrcmcsome 15. For a review of RFLP techniques see Denis- 



Keller et ai . (Cell 51 



J ^ b — j> j> 



/ (198/) hereby incorporated 



'eferer.ee 



is isolated from, the somatic cell 



.mes o: 



rrcm cells from AD patients. The DNA is digested with one or 
more restriction enzymes according to techniques of Donis- 
* K6iler et al - The resulting fragments are separated by gel 
electrophoresis, denatured, transferred to nitrocellulose and 
nycricized with the selected radio-labeled ESTs that localize 
to tne region of interest. The autoradiographic cattern is 
compared both to a number of AD mat^-n^s an^ ^ — 3 - 

±-=i --ents . Common, patterns of EST hybridization in AD patients 
tnat are not present in normal patients indicates that the 
g^nes asstoiated with tnese ESTs are candidate genes affected 
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Alterations, including deletions and substitutions, 
within gene sequences, associated with bands 15qllql3, are 
thus positively identified and associated with AD disease. 
Wagstaff et al . were able to identify deletions and 
5 substitutions in sequences encoding the GABA A receptor protein 

subunit from patients with Angelman's disease (Am. J. Hum. 
Genet. 49:330-337, (1991)). It is likely that other genes 
will additionally be associated with the disease. 

0 EXAMPLE 21 

Preparation and Use of Antisense Oligonucleotides 

Antisense RNA molecules are known to be useful for 

5 regulating translation within the cell. Antisense RNA 

molecules can be produced from EST sequences or from the 
corresponding gene sequences. These antisense molecules can 
be used as diagnostic probes to determine whether or not a 
particular gene is expressed in a cell. Similarly, the 

0 antisense molecules can be used as a therapeutic to regulate 

gene expression once the EST is associated with a particular 
disease (see Example 20). 

The antisense molecules are obtained from a nucleotide 
sequence by reversing the orientation of the coding region 

5 with regard to the promoter. Thus, the antisense RNA is 

complementary to the corresponding mRNA. For a review of 
antisense design see Green et al., Ann. Rev. Biochem. 55:569- 
597 (1986) , which is hereby incorporated by reference. The 
antisense sequences can contain modified sugar phosphate 

0 backbones to increase stability and make them less sensitive 

to RNase activity. Examples of the modifications are 
described by Rossi et al . , Pharmacol. Ther. 50 (2 ): 245-254 , 
(1991) . 

Antisense molecules are introduced into cells "that 
5 express the gene corresponding to the EST of interest m 

culture. In a preferred application of this invention, the 
polypeptide encoded by the cene is first identified, so that 
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the effectiveness of antiser.se inhibition on translation can 
be monitored using techniques that include but are not limited 
to antibody-mediated tests such as RlAs and ELISA, functional 
assays, or radiolabelling . The antisense molecule is 
introduced into the cells by diffusion or by transfection 
procedures known in the art. The molecules are introduced 
onto cell samples at a number of different concentrations 
preferably between lxlO" 10 M to lxlO" 4 M. Once the minimum 
concentration that can adequately control translation is 
identified, the optimized dose is translated into a dosage 
suitable for use in vivo. For example, an inhibiting 
concentration in culture of lxlO" 7 translates into a dose of 
approximately C.6 mg/kg bodweight. Levels of oligonucleotide 
approaching 100 mg/kg bodyweight or higher may be possible 
after testing the toxicity of the oligonucleotide in 
laboratory animals. 

The antisense can be introduced into the body as a bare 
cr naked oligonucleotide, oligonucleotide encapsulated in 
lipid, oligonucleotide sequence encapsidated by viral 

protein, or as oligonucleotide contained in an expression 
vector such as those described in Example 23. The antisense 
oligonucleotide is preferably introduced into the vertebrate 
by injection. It is additionally contemplated that cells from 
tr.e vertebrate are removed, treated with the antisense 
oligonucleotide, and reintroduced into the vertebrate. It is 
further contemplated that the antisense oligonucleotide 
sequence is incorporated into a ribcryme sequence to enable 
tr.e antisense to bind and cleave its target. For technical 
applications :: ribczyme and antisense oligonucleotides see 
Rossi et a 1 . 



EXAMPLE 22 

Presarati jgjL^i^^se_o_f_^ri_pi e Helix Probes 

• s - r -~t_or. rrom a oer. cr.e. They are particular!". - useful 
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studying alterations in cell activity as it is associated with 
a particular gene. The EST sequences or complete sequences of 
the present invention or, more preferably, a portion of those 
sequences, can be used to inhibit gene expression in 
5 individuals having diseases associated with a particular gene. 

Similarly, a portion of the EST or corresponding gene sequence 
can be used to study the effect of inhibiting transcription of 
a particular gene within a cell. Traditionally, homopurine 
sequences were considered the most useful. However, 

10 homopyrimidine sequences can also inhibit gene expression. 

Thus, both types of sequences from either the EST or from the 
gene corresponding to the EST are contemplated within the 
scope of this invention. Homopyrimidine oligonucleotides bind 
to the major groove at homopurine: homopyrimidine sequences. 

15 As an example, 10-mer to 20-mer homopyrimidine sequences from 

the ESTs can be used to inhibit expression from homopurine 
sequences. SEQ ID NOs such as 282 and 240 contain 
homopyrimidine 15-mers. Moreover the natural (beta) anomers 
of the oligonucleotide units can be replaced with alpha 

20 anomers to render the oligonucleotide more resistant to 

nucleases. Further, an intercalating agent such as ethidium 
bromide , or the like , can be attached to the 3 1 end of the 
alpha oligonucleotide to stabilize the triple helix. For 
information on the generation of oligonucleotides suitable for 

25 triple helix formation see Griffin et al. (Science 245:967-971 

(1989), which is hereby incorporated by this reference). 

The oligonucleotides may be prepared on an 
oligonucleotide synthesizer or they may be purchased 
commercially from a company specializing in custom 

3 0 oligonucleotide synthesis. The sequences are introduced into 

cells in culture using techniques known in the art that 
include but are not limited to calcium phosphate 
precipitation, DEAZ-Dextran, eiectrop oration , liposome- 
mediated transfecticn or native uptake. Treated ceils are 

35 monitored for altered cell function. These cell functions are 

predicted based upon the homologies of the gene, corresponding 
to the EST from which the oligonucleotide was derived, with 
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known genes sequences that have been associated with a 
particular function. The cell functions can also be predicted 
based on the presence of abnormal physiologies within ceils 
derived from individuals with a particular inherited disease, 
particularly when the EST is associated with the disease using 
techniques described in Example 20. 

EXAMPLE 2 3 

Gene expression fr om DNA Sequences Corresponding to ESTs 

A gene sequence of the present invention coding for all 
or part cf a huir.an gene product is introduced into an 
expression vector using conventional technology. (Techniques 
to transfer cloned sequences into expression vectors that 
direct protein translation in mammal ian , yeast, insect or 
bacterial expression systems are 'well known in the art.) 
Commercially available vectors and expression systems are 
available from a variety of suppliers including Stratagene (La 
Jclla, California), Promega (Madison, Wisconsin), and 
Invitrogem (San Diego, California). If desired, to enhance 
expression and facilitate proper protein folding, the codon 
context and ccdon pairing of the sequence may be optimized for 
the particular expression organism, as explained bv Hatfield 
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allow efficient stable transf ection. The vector includes the 
Herpes Simplex Thymidine Kinase promoter and the selectable 
neomycin gene. The cDNA is obtained by PCR from the bacterial 
vector using oligonucleotide primers complementary to the cDNA 
5 and containing restriction endonuclease sequences for Pst I 

incorporated into the 5 'primer and Bglll at the 5 1 end of the 
corresponding cDNA 3 1 primer, taking care to ensure that the 
cDNA is positioned inframe with the poly A sequence- The 
purified fragment obtained from the resulting PCR reaction is 

10 digested with PstI, blunt ended with an exonuclease, digested 

with Bgl II, purified and ligated to pXTl, now containing a 
poly A sequence and digested Bglll. 

The ligated product is transfected into mouse NIH 3T3 
cells using Lipofectin (Life Technologies, Inc. , Grand Island, 

15 New York) under conditions outlined in the product 

specification. Positive transf ectants are selected after 
growing the transfected cells in 600ug/ml G418 (Sigma, St. 
Louis, Missouri) . The protein is preferrably released into 
the supernatant. However if the protein has membrane binding 

20 domains, the protein may additionally be retained within the 

cell or expression may be restricted to the cell surface. 

Since it may be necessary to purify and locate the 
transfected product, synthetic 15-mer peptides synthesized 
from the predicted cDNA sequence are injected into mice to 

25 generate antibody to the polypeptide encoded by the cDNA. 

If antibody production is not possible, the cDNA sequence 
is additionally incorporated into eukaryotic expression 
vectors and expressed as a chimeric with, for example, 0- 
globin. Antibody to ,5-gIobin is used to purify the chimeric. 

3 0 Corresponding protease cleavage sites engineered between the 

fi-globin gene and the cDNA are then used to separate the two 
polypeptide fragments from one another after translation. One 
useful expression vector for generating £-globin chimerics is 
pSG5 (Stratagene) . This vector encodes rabbit p-globin. 

25 Intron II cf the rabbit p-globin gene facilitates splicing of 

the expressed transcript, and the polyadenylation signal 
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incorpcrated into the construct increases the level of 
expression. These techniques as described are well known to 
those skilled in the art of molecular biology. Standard 
methods are published in methods texts such as Davis et al . 
and many of the methods are available from the technical 
assistance representatives from Stratagene, Life Technologies, 
Inc., or Promega. Polypeptide may additionally be produced 
from either construct using in vitro translation systems such 
as In vitro Express™ Translation Kit (Stratagene) . 

EXAMPLE 2 4 

Production of an Antibody to a Human Protein 



Substantially 



ure protein or polypeptide is 



from th< 



transfected or transformed cells a 



s uescrmeo 



Example 22. Concentration of protein in the final preparation 
is adjusted, for example, by concentration on an Anicon filter 
device, to the level of a few micrograms/mi . Monoclonal or 
polyclonal antibody to the protein can then be prepared as 
follows: 

A. Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes cf any of the Dentines 
.ified and isolated as described can be trecarei from 
iG bybridomas according to the classical method cf Kchler, 

" ' " " (1975; or derivative 



anc Milstein, C . , Nature 256:495 



Briefly, a mouse is renetit ivel* 



- - - _ t rs m. 1 ^ r o c r a m. s 01 t r e se ^ c "~ e 
■ ^ w e e k s . The m e u s € 



s tnen sacnrioec, ana tne antiboov 
isolated. The spleen cells are 
e n e c 1 v c o 1 with m o u s ~ — \* ^ — - 
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of the wells by immunoassay procedures, such as Elisa, as 
originally described by Engvall, E., Meth. Enzymol. 70:419 
(1980) , and derivative methods thereof. Selected positive 
clones can be expanded and their monoclonal antibody product 
5 harvested for use. Detailed procedures for monoclonal antibody 

production are described in Davis, L. et al . Basic Methods in 
Molecular Biology Elsevier, New York. Section 21-2. 
B. Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to 

10 heterogenous epitopes of a single protein can be prepared by 

immunizing suitable animals with the expressed protein 
described above, which can be unmodified or modified to 
enhance immunogenicity . Effective polyclonal antibody 

production is affected by many factors related both to the 

15 antigen and the host species. For example, small molecules 

tend to be less immunogenic than other and may require the use 
of carriers and adjuvant. Also, host animals vary in response 
to site of inoculations and dose, with both inadequate or 
excessive doses of antigen resulting in low titer antisera. 

2 0 Small doses (ng level) of antigen administered at multiple 

intradermal sites appears to be most reliable. An effective 
immunization protocol for rabbits can be found in Vaitukaitis , 
J. et al. J. Clin. Endocrinol. Metab. 33:988-991 (1971). 

Booster injections can be given at regular intervals, and 

25 antiserum, harvested when antibody titer thereof, as determined 

semi-quantitatively , for example, by double immunodiffusion in 
agar against known concentrations of the antigen, begins to 
fall. See, for example, Ouchterlony, O. et al . , Chap. 19 in: 
Handbook of Experimental Immunology D. Wier (ed) Blackwell 

30 (1973) . Plateau concentration of antibody is usually in the 

range of 0.1 to 0.2 mg/ml of serum (about 12 pM) . Affinity of 
the antisera for the antigen is determined by preparing 
competitive binding curves, as described, for example, by 
Fisher, D., Chap. 42 in: Manual of Clinical Immunology, 2d Ed. 

35 (Rose ana Friedman, eds . ) Airier . Soc. For Microbiol., 

Washington, D.C. (1930). 
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Antibody preparations prepared according to either 
protocol are useful in quantitative immunoassays which 
determine concentrations of antigen-bearing substances in 
biological samples; they are also used semi-quantitat ively or 
qualitatively to identify the presence of antigen in a 
biological sample . 

EXAMPLE 25 

Identification of Tissue Types or Cell Species by Means of 
Labeled Tissue Specific Antibodies 

Identification of specific tissues is accomplished by the 
visualization of tissue specific antigens by means of antibody 
preparations according to Example 24 which are conjugated, 
■directly or indirectly to a detectable marker. Selected 
labeled antibody species bind to their specific antiqen 
binding partner in tissue sections, cell suspensions, cr m 
extracts of soluble proteins from a tissue sample to provide 
a pattern for qualitative or semi-qualitative interpretation. 

Antisera for these procedures must have a potency 
exceeding that of the native preparation, and for that reason, 
antibodies are concentrated to a mg/ml level by isolation of 
rhe g^^a globulin fraction, for example, by icn-exchance 
chromatography cr by ammonium sulfate fractionation. Also, to 
provide the most specific antisera, unwanted antibodies for 
examol 



e w o 



proteins, must be removed from the camma 



-^--^m rr act ion ; for example by means cf msol; 
im-unoabscroents , before the antibodies are labeled with 

marker. Zit^^^ — ^ ~, ^ - ~ >- - „ ^ ■ 

.„~ww^.^owa_ c^ n^^ero-Logous antisera 

suitable for either procedure. 

A. Immunchi s tochemi cai Technicraes 
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is preferred, but antibodies can also be labeled with an 
that supports a color producing reaction with a 
enzyme that supports a k rs can be 

substrate, such as horseradish 

»^»d to tissue-bound antibody in a second step, as descr 
5 ; t ^Imativeiy, the specie antitissue antibod.es can 

;:: P ie:es=d; f y rJo f an ^ - T7 — - 

the antibodies are radiolabeled, with, ror 

l0 :::::: srr o^™ «-« 

preparation : -/-;^ l :r;hr;;ocedures can comprise 

Preparations to carry ^ ^ ^ nT _ 

.onoclonal or polyclonal antibodies ^ <J~ «J « 

15 ri enica^ distinct tissue specie antigens ^ be used >n 

panels, ^^^^^^T^ - 
Tissue sections and cell P conBn0 n 

-cc c^:::: Actions t ^ t « 

Lt d and each slide covered with differentdU^ ° - 
antibody preparation, sections of Known and — t ». 

should also be treated with sera, and 

25 control, a negative control, for example, p buff(tr . 

a control for non-specific staining for example, buffer. 

Treated sections are incubated in a humid chamber for 3 0 
min at room temperature, rinsed, then washed in buffer for 
45 min. Excess fluid is blotted away, and .he 

30 developed. labeled in the 

Tf the tissue specific antibody was not labeled 

! Hon can be labeled at this time in a second 

first incubation, i^ can oe x adding 
antibody-antibody reaction, for example, by * 
• or erzv^e-conjugated antibody against the 

fluorescein- or enzyme uunj «=» = -nr 

35 immunoglobulin class of the antiserum-producng specxes , or 

examole, fluorescein labeled antibody to mouse Ig=. 
labeled sera are commercially available. 
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The antigen found in the tissues 
be TJanrif led by r.easurina rhe 



me aoove 
:ensitv of 



rlucrescence on rhe tissue section, and oalibrarino that 
signal using appropriare standards. 

B. Identification of Tissue Specific Soluble Proteins 

The visualization cf tissue specific proteins and 
identification of unknown tissues from that procedure is 
carried out using the labeled antibody reagents and detection 
strategy as described for irmnunohistochemist ry ; however the 
sample is prepared according to an electrophcretie technique 
to distribute the proteins extracted fror, the tissue m an 
orderly array on the basis of noiecular weight for detection. 

A tissue sample is homogenized using a Virtis apparatus ; 
cell suspensions are disrupted by Dcunce hcmogenizaticn cr 

is re era i r e d to 



:s..ictic lysis, using detergents in either case as 



t cell -e-iranes, as is the rractic 



e 



insoluble cell components such as nuclei, microsomes, and 
memrrare fragments are removed by ul tracentri f ugat ion , and the 
se^ucie protein-containing fraction concentrated if necessarv 
and reserved for analysis. 

A sample cf the soluble protein solution is resolved into 
individual protein species by conventional SDS poly a cry I amide 
^^e^tropnoresis as a escribed, for e >; amo 1 e , b v Davis, 1 . e t 
a^., Section 19-2 in: Basic Methods in Molecular Biology .;?. 
_ecer, ed) , Elsevier, New York (19S6) , usinc a ranee of 
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nitrocellulose blots is stained with Coomassie Blue dye to 
visualize the entire set of proteins for comparison with the 
antibody bound proteins. The remaining nitrocellulose filters 
are then incubated with a solution of one or more specific 
5 antisera to tissue specific proteins prepared as described in 

Example 24. In this procedure, as in procedure A above, 
appropriate positive and negative sample and reagent controls 
are run. 

In either procedure A or B , a detectable label can be 

10 attached to the primary tissue antigen-primary antibody 

complex according to various strategies and permutations 
thereof. In a straightforward approach, the primary specific 
antibody can be labeled; alternatively, the unlabeled complex 
can be bound by a labeled secondary anti-IgG antibody. In 

15 other approaches, either the primary or secondary antibody is 

conjugated to a biotin molecule, which can, in a subsequent 
step, bind an avidin conjugated marker. According to yet 
another strategy, enzyme labeled or radioactive protein A, 
which has the property of binding to any IgG, is bound in a 

2 0 final step to either the primary or secondary antibody. 

The visualization of tissue specific antigen binding at 
levels above those seen in control tissues to one or more 
tissue specific antibodies, prepared from the gene sequences 
identified from EST sequences, can identify tissues of unknown 

25 origin, for example, forensic samples, or differentiated tumor 

tissue that has metastasized to foreign bodily sites. 

The entire contents of all references cited above are 
hereby incorporated by reference. 

While the present invention has been described in some 

30 detail for purposes of clarity and understanding, one skilled 

in the art will appreciate that various changes in form and 
detail can be made without departing from the true scope of 
the invention . 

35 VII. Correlation of EST and Clone Identifiers 

The EST sequences of the present: invention are identified 
herein by SEQ ID NO, and are identified in the GenBank 
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catarase by a different nuitber, are identified ir. the 
inventors' lab (and upconing publications) by EST r.u-Jser, and 
clones have been subnitted to the American Type Culture 
Collection (Rockvilie, Maryland USA) under clone nair.es. Table 
12 cross references those different numbers for the ESTs fro- 
cDNA, SEQ ID NOS 1-315. 
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Table 12- SEQ ID NO Cross References 
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ATCCTAGTTG ATGGTCTGGG TTATCAGAGG AGCAAAAACA TTTAAGTGTC AAATAATGCT 120 

CATTGTCTCC CTGGGATTTC T AAA C AG AAA AAATGAAGAA AGAGGCAGAG AAGAGCTTCA 180 

CAAGGTGTGT GCCAGCTCTG CATCATTTCC AGCTGCTCAA CCACCATTTC TCCCATTTTA 240 

GGTCCCCAAA AGTAGGAGGT GGGGCCTCAC AGAGCTGCTG TGGGCTTTGG GTATCAAAAG 300 

CTGCAGCCAC CATAT GGGGC ACTCCTGGCT GGTGTACAGG GTGGGCATTG CCCAGGTCTT 360 

TT 362 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 214 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

GTTTTNCTTT TTTCTTAGCT TCATTTCTCT TAAAAAACAA GGAACAAGAA AACATTGCAC 60 

CAGCGTTCTA AGCCTCAAAC AAAANACAAA ACAAATCCCC CTGCGAAGAA CAATAAACTT 120 

TACATCTCTT TGGCAACAAT AACTTAAAAT CACCCAACTT CCATTCGCTC CAACCACAGC 180 

AGTTAGTTAG TTACAAAAAT ATTCCNTGTG CTGC 214 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 344 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

ATTAATAGGA AAGATGATTG TATAGATGGT GGGCTATTAA CTCAGATCAG GATGAGAATC 60 

GGGAGTGCCT TTACATGTGT GGTACCCAAA TGGGTGGTTG GATATAAGAG TAAC AAAAG G 120 

ACTGAAAGGG T T AAAAAAG A AA G AAAAAAA AAAAA CTCCC TGGTTGGGAG GGTGTTAAGT 180 

ATCGAGTGTT TTTCCAAACC ATTCCTCCTC TGCTCACCTA CCCCTAGGTG ATTAAAGGAG 240 

AT AA C T T TT A AAAAA G AAAG AATTGGCTCA AAGGTACTGT AAATTCTAGG AT T AT AT AC C 300 

TXT AT AT AG G TTCATTCCCT GATCCCTGTA TTATCAAGGC ACAG 344 

(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 352 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi; SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

GO AO ACT CAT CTGTOCC00A OCT CO AG AT C OOTGGAGGOA GCTGACCAAT GATGGGCGGT 6 0 

GAOCCGGTAA CCGAGGCGGO AAGGAGGCCA GGTAGTOOCG GOAOCTOTCA CTOTGOAGAG 120 

ACCAGCGGGT ZZGZGGGAGG COTGTGGGTO ACACGTAGGG GOTAGAGIOA GOOTGOATCO 120 

TGCCCAOOGG GOTOCACTTG GAGATCAGOA GGAGGGOGAG TGTGGGA000 CTGCTGCOAO 2-0 

CTCTOCTGGG CITGTKTOOT TTOTGG.AAAT TAAGAAGGTG TGGTCOAGAO CCAAGAGGAG 300 

CAAIAAGAAA CCTCGTGTGO CAGCTTCTTA AGGCTKGCAG TGCAAGACCC CA 2 52 

(2) INrORXATICN FOR SET II NO : 5 : 

■;i> S^GVZ^CZ CHARACTERISTICS: 

: A- LENGTH: 562 base -pairs 

5 : TYPE : r.ucleic acid 
■ C ' S7RAN2EDNESS ; double 
tD: TOPOLOGY: linear 

:xi; SEQUENCE DESCRIPTION: SEC ID NO : 5 : 

TA„ ATATATATTC AOAGAAAATC ATATTGOATA TACTCTTTOT CCACATCATA 60 

.GGGZG TTGGOCTCTC TAGOACACAA GGGAAGOAGG CCAAACTIOT CATATTTTOA 12 2 

^- ^ . w ■ — , w .A- 1 . ^ G T AA T A G G AA C C T T ' 



. - w ^ i-r.cLA. i .IC T G ^ G AA. G ^- A 



^A^.A .AGiTC^AAA -GGAAIACGG ATCTTT..AT TTAAATTOCA ATCATv 



. ^ ^ _ ^ _ ^- . . „ — - - - - ^ A ^ ^- A ^- G o G A A G.G'cCT. G C 2 .AA GGGGZZ A ^ G A G 0 ^ w A 
*~ ' " v: ' w " ~ - ~- -~ - ^ ~ ~ ^ ~ ^ w ^ A G 0 OTA 20 C C AG G G G N C A C C C C A 0 T T AG G TT G TTTT G T 
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TGAAAGAGTA ATTACCATTT ACTGAAGCAC TTATCTGTCC TACACTGATG GG AGTAAATG 240 

CTTCTCATAG GTTATCTCAT GTACATTATG CCACTTTNAC TTAAAATGAT CACAATTNAG 300 

TGCTATAGGT TTTTGGGTTA ATGTTTTCCC NGGGGGAGTT GTTAAAAACA TGGCATTTC 359 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 218 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 



AACTTGCAAC ATAAATACTA GAAAAAGAGA AAATATCATC AAAAT A C AAA TAACTGTTAG 60 

AAATCATTGC TCAAAAGAAR AACCTGGCAA TGCATGATTA CGAAATGCAA AAGAMGATAC 120 

AGTTGCTCTC TGTATATGCG CTTTCCACAT CCACAGATTC AAACAACTGT G G AT AAAAAA 180 

GGATTTTTCA ATGCCATTAA ACAVCAATGC AACAGTAA 218 



(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 345 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

Cxi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 



CTACAATAGA AGGCAAACTA TGTCCCTCCT TTGCTCAGAA ACTTTTAATA TCTKCCTATT 60 

TCCCCATGTA AAAGCCAATC CTCAACCACA GTGTAGAAGG GCTATCCATT TCTAGCTACA 120 

CATCTCCTCA GTCACTGCCC CCAGCCCCAG TACTTGGGGA CTTTGCCCTT GCAGTTCCCT 180 

GTGCCAGCAA ACTCTTCCTC CAGATGTCCA CATGACTCAC CCNNCTCCTT CAGGGGTCTT 240 

CTCAAATGTC ACTTTACCAG AGGTGGCTTC CCTGACCATC CTGTATAAAT AGCATCACCC 300 

TACCTCCTAT CTCTCTCTCT AATGTCTCAG GAATTCGATA TCAAG 345 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 189 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 
CD) TOPOLOGY: linear 

■ (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

G T G AAC AG AC TAAGGCCTTT NTGGAGGCCC AG AAT AA G AT TACTGTGCCA TTTCTTGAGC 60 

AGTGTCCCAT CAGAGGTTTA T A C AAA GAG A GAATGACTGA A C TAT AT G AT TATCCCANGT 120 

ATAGTTGCCA C T T C AA G AAA GGAGAACGGT GTTTTTATTT T T A C AA T A C A GGNTTTNAGA ISO 



r 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 339 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

VCTVTCTVCC AACTTCATTC AGATATTGAC TCTGGTGATG GGAACATTAA ATACATTCTC 60 

TCAGGGGAAG GAGCTGGAAC CATTTTTVTR ATTGATGACA AATCAGGGAA CATTCATGCC 120 

ACCAAGACGT TGGATCGAGA AGAGAGAGCC CAGTACACGT TGATGGCTCA GGCGGTGGAC 180 

AGGGACACCA ATCGGCCACT GGhGCCACCG TCGGAATTCA TTKTCAAGGK CCAGGACATT 240 

AATGACAGTC CTCCGGAGGT TTCCTGCACG AGACCTATCA TGCCAACTGT GCCSTGTARA 300 

GGTCCAATKT TGGGTGSTGT ACGGTAGTGG GGAGGCCTG 339 
(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 342 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GGGVG CAAAG TAGCAGATTC TAGTAAAGGA CCAGATGAGG CAAAAATTAA GGCACTCTTG 60 

G AAA G AA C A G GCTACACACT TGATGTGACC ACTGGACAGA GGAAGTATGG AGGACCACCT 120 

CCAGATTCCG TTTATYCAGG TCAGCAGCCT TCTGTTGGCA CTGAGATATT TGTGGGAAAG 180 

ATCCCAAGAG ATCTATTTTG AGGATGAACT TGTTCCATTA TTTGAGAAAG CTTGGACCTA 240 

TATGGGATCC TTCGTCTAAT GATGGATCCA CTCACTGGTC TCAATAGAGG TTAATGCGTT 300 

TGTCACTTTT TTGTACAAAA GGAGCARGCT CAAGGAGGGC TG 342 

(2) INFORMATION FOR SEQ ID NO : 15 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 354 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

ATGTTGATGC TGAAATTVAA GATCCACCAA TTCCAGAAAA ACCATGGAAG GTTCATGTGA 60 

AATGGATTTT GGACACTGAT ATTTTCAATG AA7GGA7CAA TGAGGAGGAT TATRAGGTGG 120 

ATGAAAATAG GAAGCCTGTR AGTT7YCGTC AGC ~C A777 I AA CC AAGAAT GAAGAGCCAG 180 

TCAGAAGTCC AGAAAGAAGA G ATA G AAAA G CA7CASC7AA 7GCTCGAAAG A G G AAA. C AT T 240 
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GATTCATTCT GATGCCAACC CCCATCCATC ATGCCATGGA TCGCTCTAGA CTTCTTCCCT 
TGTAACCTCC CACTCAAACA GTGAGAAACC TTTGCCCAGT ATGTTTTGGA GTAACCTCAC 
TGGGAGTTTG CAGTCCCACT AGATGAATGC CAACCCATTT GTTCATTTAA AAGGACTTTT 
GG^CCATAG AGCAATGGCT GGGCTGGGTC TVGCACGTTC ATCTTGACTG AAACAATTGG 
CCATGAAGGC ACTTGCCAAG GAAACTCTAG GGGCCACAAG GGTCCTGGGT GCTTGC 
(2) INFORMATION FOR SEQ ID NO:19: 

m SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 339 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CATGCTTCCA TTTTTTTTAG TTTTAAACCA CCAAACCAAT ATTTTYCCTT TAAATTTTAA 
TCTTATAATA TAGAAATCTT ATGTAAATGA AATTTTGTCA TGTTTCAAAT AAAGAGAACT 
GAAGTAGAAA ATAGAAATGC CAGTAAACAA CATAATGTTT AATTTACAAC TTACATTAGG 
GGTTTGGGGG VATGCTAATT ATATATTGAG AATATACATT AGAACTCTTC AAAATGGGCT 
CTTCTAATGA GGTCACTACT GAACATAATT GTTCCCTCTT ctgttaaata gaataggttt 
AAATGACTAG TCCAAATGGA ATTATTGCCT TCTKGTTAA 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 437 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
AG^CAAGGG AACTCAGCAG CCCCTCCCTT CCCATCAGCT GTTCCTGAGA GATGCAATAT 
AGTAGTCATC GACATCATCC TTATCAACAG CATCATCACT CAGACAGTGG tgaaagtctt 
TCTTCACAAG GAAAAACAAA GATAAAGAAA TACATGAGCA TTAATCAGAA ATTTTCAAAG 
CTTGGMTCT AATGATATGC ATTATCATTA GACATTCAAA TGCTATACAT CTTCTGATGA 
AGCCTCCTTG ACAGCAGCTA CACTTATTTC ACATTAGAAT GCCTAGAGAA ATCCTGACTG 
CCCAGCTTGG TCATGCGACC TTCCCCACTC TCCTCTTGGA GGAATGAAAA GATGTGGCGG 
CTTTCTACTT TTGCTACTGA GCTGGGGTAT ATGGCTAGGT CCACTTTCTA AGGGGCTTGG 
AAGGGTTATT CCATCTG 
(2) IN70RKAII0N FOP. SEQ ID NO: 21: 
(i) SEQUENCE CHARACTERISTICS: 



120 
180 
240 
300 
356 



60 
120 
180 
240 
300 
339 



60 
120 
180 
240 
300 
360 
420 
437 
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CTGAAGAGTT TCCCAGAACA TTCTTGTGAA AAGGAATGCC TCCCAACAAT GGAGAGCAAC 240 
AATAGCAACA GGCATCTGAA TCAGCCTGGC CTCTGAAAAC AGACCANAGA GGAGTTTATC 300 
TGTTTCTTCC AGTGGAGGAA GG 322 
(2) INFORMATION FOR SEQ ID NO : 24 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CCTGAAATCG GAGTCTTTTG GACTGACTCC AAATTCAATG GGTGGCACAG GCAGCACGGA 60 
GTCCACGTGA ATCTCCACCC CGTTAACAGG CGGGACGACA GCCCCTTGCA GCC 113 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 399 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

GGAAAGAATG AAGGAAAAAC AAGACAAAAT CTACTTCATG GCTGGGTCCA GCAGAAAAGA 60 

GCAGACGCTG GCCTCAGACA CAGACAGCAG TCTTGATGCC TCGACGGGAC CCCTTGAAGG 120 

CTGTCGATGA TAG G TT AG AA ATAGCAAACC TGTCAGCATT GAAGGAACTC TCACCTCCGT 180 

GGGCCTGAAA TGCTTGGGAG TTGATGGAAC CAAATAGAAA AACTCCATGT TCTGCATGTA 240 

A G AAA C A C AA TGCCTTGCCC TACTCAGACC TGATAGGATT GCCTGCTTAG AT G AT AAAAT 300 

GAGGCAGAAT ATGTCTTGAA GAAAAAANTT GCAAGCCACA CTTCTNGAGA TTTTGTTCAA 360 

GATCCATTTC AGGGTGAGCA GTTAGAGTAG GTTGAATTT 3 99 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 355 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

GAIT GG TATA GGGGG AA C AA TGGATTGATA GCCTTAATAT AGAAATAGTT CCAGCAGGCC 60 

AGATGCAGTG GCTCAATTCT GTAAACCCAG TGCTCTGCAC AGCTAGGAAG GAAGATCACT 120 

TGGGGGCAGG AGTTCAAGGC TCCAGTGAGC CATGATCACG CCACTKCCTC CAGCCTGGGT 180 
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GACAGAOTNA GGZCCTZ7CT CTAAAAAATG AAATAGCTCC ATCAAGTCAA TAATTAAAAG 2-^0 

ttcaacagcc caacaganca aaaattgtaa atgancacaa attagaaaat gtacaaatta ooc 

AATATIAATG ACCCATAACC CTATAAGGGA AAGTTTAACC TCTCTAGTAT TZZIT 3;5 
(2; INFOP^OATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 322 base oairs 
(B; TYPE: nucleic acid 
;C) STRAUNDEDNESS : double 
(IV TOPOLOGY: linear 

(>:i; SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

AAAACGTGAT CACCACAGCT CCGTTOCTGC AGTGACACTT AACATACTCA GCATCTTCAT 60 

GA-ATTCTGAA. TAATTTACTG ATCGTAAAGT CTAAAAGTAT CAATTTCAGG TGAGCAGTTT 12: 

.AAA.CAGAA A_A T A G T 0 A_AT AGTTA.ATCAT GACTCTTCAC GGTATTTCCC TCACGTCCTC ISC 

~ G.AAGA.GTTT CCCAGAACAT TCTTGTGAAA. AGGAATGCCT CCCA.ACAATG GAGGAGCAAC 2^0 

AATAGCAACA GGCATCTGAA TCAGCCTGGG CTCTGAAAAC AGAOCAAAGA GGNGTTTTTO 300 

* — ^ ^ l. ^ ^ o A. c- G A_A G G *; " n 
;i; INFCRXATICN FOR SEC 10 N0:2S: 

(i' SEQUENCE CRAJLACTERI STI CS : 

vA^ L^NCH: 2S7 base -airs 
(2 ': TYPE : nucleic acid 
(C;. STRAONOEONESS ; double 
;0; TOPOLOGY; linear 

^ ° ~ - ~ -~- - - ^ w- ^ _ ^ ■_- . M G T o A. A. ~ u A_A T G G A. T T 0 AAA. 0 A. G G Z 0 A-A G A. c I 

- - _ -.v- M.-.- — .AAG. TAGGAAG0T0 GTATAGAAA.T CTOGATGaIA TATGGTOGCT III 

. - v.-.. -A^AGCGAG 0AT0G0AAGT AGGTGGATTA CTTTAOACTT TTTTAGATCA 150 

^ ^-A.GTCTTGA. AGACAAATTA ATCTCATA.TA TA-ACTCTA_AA 0A„A2A.TATTT 2-2 

. _ . ^ _ .-^r^-. ^ ^ ^- A _ ~„A ^ o _ ^ ^- A. „ 0 A_AA. T A. T T A. G 3 A. 0 0 T T T ~ z ~ 



~ r _ case cairs 
:.: I e i r ac ; d 

N E5 " ' a ~ u~ " ~ 
0 linear 
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GGGCTCTCCA TTGCCTGCCC TTGCCTCTTT CTAGCCTGTT ATTTCTAGGC TCCTCTGAAT 180 
AAATCTCAGG TTTCCTACTG TCATGCCTTT AGTTCAAAAA TGAGAATCTG CCCTACAGTG 240 
CTGGCCTCCT TCCGGCCTGA AAGCCAGCAC CTTKCGACCC GG 282 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 345 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GAAGCTGGTG AATACATTTC AAGACACAAC ATGGCACCTG TGTCTAGCTC TATGGTACAA 60 

CATGGTACTA TGACACATAT AATGGGTTGC CAGATGGGGA AGGCAGCTTC TCTGCAACTG 120 

AGCTGAGATC TCAAAATAGA CAATGTCAAG ATGGAATGAG AAGGGAAAAA CAGCATGTGT 180 

AGACAGGTAG TGACAAAAGG CTAATTAAGG ACTGAAAGAA ACCAGTGGCC AACAAGGGAA 240 

TCTACGGGTG ATAAAGATAA GACGGTGAGA GAGATAAGGC TAGATTGTAT AAGGCTTGAC 300 

AG AC C AT AG C AAGATAAGCA AGGACCTGTG TCCTGTTAAC CATTT 345 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 343 base pairs 
(£) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

A.TAAAATTG G TCTGGGTACC CTAAGGTGTT TGCKTTGATA GAAAATTGAC ACCCCAAACT 60 

AAGTGTTCTA CTTAGCTTCT ACAATAGTTA TTCCTAGACC TTAGATTAGT C ATT A CATTT 120 

TTATTTAAGG TACTATGTTA CTTTCATGAC TACAAAATGA GGCACTCGTA CAAAACAGGA 180 

AT G AAAA CAT ACATATACTG TCTTGTCTTT ATGTCGTATT AATGCCAAAG ATATTGTCAG 240 

GGATTATTTT AAAGAAGCCC TTACTCATGA TGGCTATTTT TAAAAATGGC ACAGGACAGT 300 

AACAGGCTGA AAA G AAA C A C CTGGTTTGAG GGGCC AAATT AAG 343 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 153 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 32: 
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AAA.GGA.TGGT 2AGGAGAAG2 2ACGTCTGGT AAA. G T G A C A 7 77GAGANGA2 222TGAAGGN 6C 

GGGGGGTTGA GTCATGTGGA GATGTTGAGG AAGAGTTTAC TGGGACAGGG AACTGCAAGG 121 

KCAAAGTCCG CAAGTA2TAG GGG7GGGGGC AG7 15 2 

(2) INFORMATION FOR SEQ Ij NO: 33: 

(i) S EGYENGE CHARACTERISTICS: 

(A) LENGTH : 257 base pairs 
■;5) TYPE: nucleic acid 
■■[ C > S7PANDEDNESS : double 
■2: TOPOLOGY: linear 

(xi) SFOYENCE DESCRIPTION: SEQ 10 NO : 2 2 : 

*"* ' J - ~ TA.TOG 0 AGGT G 0 A G 0 0 AAA 0 A C AAA G C T T 2 A G G A. 2 AAA. T T G T A 0 AAA. C T 7 i 0 

7 A.GAA7 07 GG GA777AAA77 TAAAATA7GA 7A2A7AAAAA TC7ACAAAAA AC7GATAAAA 122 

A-CAAG2A.2A GNTA.OCAGGA. 7TGAAACTT.A TAATAAT02A TG707GAAAG 0-GA.G7077G7 1 £ C 

.^'-OTTTCAA GTGCTT77A7 T07G07A7GG AACAG70AAA A7GGAAGN7G 7AAAG077TG 200 

- - - -~ - * - - AAATTA.7 - ~" 

{I] INFORMATION FOR SFQ 10 NO: 3-: 

A ;■ LENG7H : 507 base "airs 

(5) TYPE : nucleic acid 

; C; S7RAN0E0NESS : double 

■0; TOPOLOGY: linear 

. <■ 2 . SPG7FNCE DESCRIPTION: SFQ 10 NO: 3^: 

. ^ „ . w ^ ~_ ^ ~. ^ ^ ^ w ^ 2 0721A. G 0 7 G 1 2 7 C 7 1 7270" AAAAA 7 A 0 0 N AA 0 A. 7 2 z 2 



~ ^ „ . 



- ~- ^ - - - — ^ ■ — ■ ~ ^ v.- w L _ ^ _ A. ^ v_ ^ . _ A. v_ v 

— .« ^ ^ — ^. . . ^ AAA G . 2 * G 2 T A. 2 G T 2 A. 1 0 G 7 G 2 7 7 G GTA. G 7 G AAAA7 G G " 
^ * ^* — - '■ — -~- — - ~~ - ~ — - ^- w ~. ^- ^ ^^_- J _". ~. u^^wA^A- ^ » _ AAA. ^ ^- A. G A. G AAA. 0 G G G " 



:a?e r a l r s 
. - _ - -■ 2 
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„ riTGGATCTG ATCTGTACCA TGAGCCTACA TAAGGCTGGA 

IIC ™ aatcgtgtgc a^- gtgiggitgg gactgtgtci 

TGGACCTCAG GCTGAGGGCC GAATGTATGT KT tGTGGCA CATTAATRAT 

GCKGAGTAAG AAGACG^T, TGAAGATTCT aaaggtcaat thaagtggca ca 



AAACTCAGAT CTGNTCAAAA GTCCGG 
(2 ) INFORMATION FOR SEQ ID NO: 36: 
(i) SEQUENCE CHARACTERISTICS: 
U; (A) LENGTH: 388 base pairs 
fBI TYPE: nucleic acifl 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO. 36. 

™ rCTCTGAACA AAAAGCCAGA AGGCTGCTTA aagaaatagt 
CAGCTTTGGA AAGACTTTGA CCTCTGAACA AAA* TGCCTTGGGT 
AAGGGTTTCA CTTGCCCTGG ATACTCAC.A atctagcagt ^ 

~ CAG " = — CCTCATAAGA 

TGACAAATGA AAAAGAAAAA AAGGCCTTGA ta ATGCTGTG AATCTACTTG 

ATGG C,AAAA TTACATACAC ACATACATAG ACAAGGGACC 

AGCTGGATTG catgctccct agggaccacg gtgcccaacc tgtaatttta 



ITATAAATAT ACTCCTTTTT CACGGATG 
(2) INFORMATION FOR SEQ ID NO: 37: 
SEOUENCE CHARACTERISTICS: 
U) (?) LENGTH: 342 base pairs 
m TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



120 
180 
2A0 
266 



(x<) SEQUENCE DESCRIPTION : SEQ ID NO:37: a t rTTCGC 

~-r rrTTTTCTAG ATGTCATATC CAAACTTCGC 

G , m G ac.agga.ag, acaggatx.g acttactgtg 

AGXGAXGAGA ACAAAAGTGT .GCCCACCAG GGC GTGX A ^ 

n , tr xrrrCCA TACACGGCAT CATCCCATCT CiAAiiiu 
G ^ G Tc - x^TTCXCX ACCAXACGAC T^GCA.GC A«X 

CATCCAGCGG CTTCTTCCGC rrTCCCT^T MGGATTTCT AAACCTATAG 

C^ATACCAA TTGAAGAACC GCTGTAGGTA CCTCCCTAA-l AA 
TTAGTGTGAT CATGACTTTG CTCAAAGGCA AGTYTCCGAC GC 
C 2> INFORMATION FOR SEQ ID NO: 38: 



M) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 355^base p^s 
CB) T^FE: nucleic acid 

(C) STRANDEDNESS: douole 

(D) TOPOLOGY: linear 



60 
120 
180 
240 
300 
360 
388 



60 
120 
180 
240 
300 
342 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

GJTIGAZTZGG AGAATGCCGA AGAGGAAGGC CAGGAGAATG TCGAGATCCT ZCZZTZIGGG 60 

GAGGGAGGGZ AGCGAACCAG AAGCGAATCA CCACACCATA CATGACCAAG TACGAGCGAG 120 

CZZGZGTGZT GGGCACCCGA GCGCTCCAGA TTGCGATGTG 7GCCCCTGTG ATGGTGGAGC ISO 

7GGAGGGGGA GACACAICC7 CTGCTCAT7G CCATGAAGGA ACTCAAGGCC C G AAAG AT CC 240 

CCA7CA7CA7 TCGCCGTTAC C7GCCAGA7G GGAGC7A7GA AGAC7GGGGG GG7KGACGAG 300 

G7CA7CATCA C0GAOT7GAG CTGGAG7CAT CTTTCCTGMC CTTTGCCCCA ZGZCC 20 5 
(2) INFORMATION FOR SEQ ID NO: 2?: 

(i) SEQUENCE CHARACTERISTICS : 

(A* LENGTH 303 base cairs 
■■5 i TYPE : nucleic acid 
' 0, STRANDEDNESS : ccuble 
iD; TOPOLOGY: linear 

x : SEQUENCE DESCRIPTION: SEQ ID NO:29" 
^CCAAAAACA NYTCTGAACC CGGT7ZGGGA AATAATGGGA TTCCTTGATC ACGGGACAAC 60 
GAATCACCCT GAAGTTTTTC TCCAGTTTAC TCAGTCACAT AAGCCACOAC AGGCTAACCA 120 
-AC.GACAAC AA-AA G C AA. G T CCCAGGATTO CGGGGGCTAA TACCATGCTA CGCATTACTT 
GCGAAGTTAT GAGTTGGTAT ACATCTGTGA ATTTGGTGGG AGGAGAAAAO TAACAGTAAA 

A.CAAAG CCAGTGGTAC GTT OAGCGTT ATAAAAATTA CAAGGATCTC Z7ZZZZGGGG 

ACT 

(2; INPOR/^ATI 

S EC/CENCE CHARACTERISTICS: 
' LENGTH: 175 base 7, airs 
(3; TYPE: r-lei: acid 

0 S.FANDE2NESS : double 
;o; TOPOLOGY: linear 

■xi; SEQUENCE DESCRIPTION: SEC : 



'-" * - ^ -^s 



• ~ - - ■* ^ — ° - •**- - ~~ ~- — _ A _ _ A ^ C 0 „ A 



- ~ w „ ^ 



- - i as e 

ar i d 

linear 



i£; 
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TGCCTTTCTT TAGAAATTTA GGGCAGTGTG ATGCTTCCAG 
TTTCATTGTG CTTGGGAGTT TCCATGCCTC TYCCTTCTCT 
CTTTTTATCA GTTTGACTGC CTGAGACTGA KTCCAACAAC 
CTCCKTTTCA AAGGAGGATG ACTTNTCTNA ACAACTATIT 
TTATTAAAGC AATGGCTCTA AACAAATTCC ACTGGGGGTG 
CGTACTCTGA GGGCTTGGGG GT 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 278 base parrs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
AAACTTTGGC ATTTTTATTC AGACACGTAT AAAAACAAAA 
A.CAGACGTTT TCCCTTAGTT CCCCATCCAA GGGGACAGAG 
CTTTTTTCTG TCCTACCTGG AAGCTGTCTC ACTGCTGGAT 
GATCTTGGGG ATCCTTGTGA ATTTGCCCTC GGATAAGGAG 
TGTGGATTAT GGTTTACACA AAGATGTCCA GTTATTTT 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 225 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 43: 
AGATCAAAAG ATGAGAGAAG CTGAAACAGA ACCGCATGAG 
TCTGTGGCCC ATCTTCAGGA TCCACCACCA GAAAACCCGT 
AAG C G G AAAG CCAGCAGCAG GATCTCTAGG AATATTAGTA 
TAAACCTGAT TTCAAAATGG TAAAAGCAAG GTTATC-TGTA 
(2) INFORMATION FOR SEQ ID NO: 45: 

m SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 305 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ 12 
GGATTGCCAG GAGCTGTTCC AGGTTGGGGA GAGC Z.-.ZkC 



AGGTCTGTAC AAACACCAGC 
TCGCTTAGTG CACGTTTCTG 
CCAAACTGAA CGCTCAGCTC 
AGGTGAATTA TTKCKACAGT 
ACAAAGTACA ATACAAAAGG 



CAAAAAACTT CAGTGATACA 
GTGTGCAGCT GAAGCTGGAY 
GAGAATGGCT TCTAAAAGTG 
TGAAGWTCAT TTACGGCACA 



GGAAAGAGGA AAGTGGAATC 
TACATCTTCG CCTCTTTTAC 
TT AAAG AAG G CTATGCAGCA 
CTTGT 



60 
120 
ISO 
240 
300 

322 



60 
120 
180 
240 
278 



60 
120 
180 
225 



;actatttg aaatccagcc 



60 
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TAGCAATTCC TGTTTCTCCT GCTGTAACTG CTCCTTTTCC TTCTGGAGCA CACGCAGGGC 180 
TGACCGCAGC TGTGTCAGCT TCCGCTTACT TTMTGACAAC TGTACCAGGC TAGAATCCTT 240 
TCTGCCTGGG TCAGCTTCAG TCTTTGAACA 270 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 359 base pairs 

(B ) TYPE: nucleic acid 
CO STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

CCCTGAAGAG TGGGTGGGAC AACCAGATGG GTGTAACCCC TTGTGGGGGA AAAGGAGTGA 60 

GTTTACTTGG TAAAATAATA ATGGTAATGT CAGCAGCGTG GCTGGGGGAC TCAGTATGGT 120 

CCCGGGAAAA GAGTTGGGGC AGTGAACTTC CCAGGCCGAC TGGCCTTGGG CTGGCAGCAG 180 

GGAGGCTGCA GGGCGCCTAC CTMCTCTGCC ACGTCCCTGC CTAGGAAACC TATCCCAGGA 240 

CACCCTGCTT TGGCCTGGAT AGCAGCCTAG GGATGAGCAT TTCTTTGAAA GCAATTAGGT 300 

TATTCACCTG GTATTAAAAC TATTTACTGT T AAAAAAT CT GTGACTTCAT GGARGTGGG 359 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 271 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

CCAGGAAGGA CAGGAAGTGT CCTCTAATAC GCATAAGATC CAGTACAGGA GAGATGGGAA 60 

GMGAGKCTCC AGGATGAAGG GGAAAARAGG CCGCATGCCA GTCACCTGGC ATCTNCCAGA 120 

GAGGGYCAGY CTNCCCACTG AGACTGGGGC ACGAGTCCCG TCATCACCAT GCCCTCTGAC 180 

TGTCGAACTG TCTTTTTACC T G A C AAAT A C TACACAGGTA TCGMTCGTGG CCATACTCTG 240 

CTATCTAAAC CCAGGAACTG ATTAGATTGT T 271 
(2) INFORMATION FOR SEQ ID NO: 51: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 226 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
CTCCAAGCAG TAAAGACTTG CAAAGCATTG CATTTTGATT AAACCTTGCT GGGCTGAAGG 60 
GChGGCAGhG CTGTGGTGGA CACTGGCAGG ACGCAGCACC CCCCG^CTGG CCCTTGGCAG 120 
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CTGGAGCAGT GGTTCTCAAA CTCGTGTATG CATAGGAATT ACCTGAAGGG CTTGTTAAAA 120 

CACAAACTGC KGGGGGCAGG CCCAGAGTTT CTGGTTGGGG AGGTGTGGGC TGGGCTTGAG 180 

GATGTGAATC TCTCACAAGC TCCCAGGTGA GGCTGCTGGT CTGTGGACCC ACTTCAAAGA 240 

CCC AG TGAAT CAGAAGAGTC AGTGAGACTG GACAAATGAA CGCAAGACAG TCTTCAAAGG 300 

AGACCAGAGG 310 

(2) INFORMATION FOR SEQ ID NO: 55: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 252 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: 

TTTTTTTTTT TYCCGGGGAR GTCAAACATA CTTTTTCAAC ATAGGATKTC TGACAGGAGG 60 

CCCTTGGMCA GGGTTCCCTG ACCTCTGYTT CAAACCCCAC TGGAAACAGA GCAAAGTCAT 120 

CAMGAAAACC CAGGACACCA GGGCAGGGGG GCTGCACAAG GTCGGGTAGG TCACAGTGGG ISO 

CCAGCACACA GTGGCCCCGC CCAGGTCCAG CCCAGCCTGG GGGAGGGTGT GAGGGTTCCA 240 

KGCAAGCTCA TT 252 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 188 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

GTCAAGTCTA CCATCATTCT AGAAGGAAAA GGCATGGTGG GAATTCAGCA CCTGAACTTG 60 

TATTTACACC AGCCTCGGCA TCTGGCAAGG RAATAGCGAT TGTTCATAGT GAT GC AG AG A 120 

GAG AA C AG G A GGAKGAAGAA CAAATACACA CAAACAACTG ATCTAGGGAG ACTCCAARGA 180 

TCCAACAG 

(2) INFORMATION FOR SEQ ID NO: 57: 



188 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 304 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
AATCAGCC7G C AA G C AAAA G A7AGGAATA7 TCACCTACAG TGGGCACCTC CTTGAAGAAG 60 
CTGATAGCTT TTACACAGTA TTAGATTGAA ATAATGGACA GAAACACATT CTTG7CAAGA 12 0 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

GTGTTTCAAG GGAAGGCAAC TMCAAGTTTG TGCAGCTGAA TTTCTGTAAA GTTAAGACAG 60 

ACTCAMCTTC TCATTCAATC TGGGGCAGTG GATAACCTTT CTGAATAGAC CCACTTGTTC 120 

ACGGACAGGG ATAGAGGTTT GCCTTTCTTC TTTCCTTGAA TTTGGAGTGA GCACTAGGGA 180 

GGGGAAGTGC ATGGGTGACA TGAAGAAGGT GAAGATGTAG TAAAAGCATC ATCCAGGTAC 240 

ACATTAACGG TGCTGCAGAA TTTTCACAAT ACAACTGAGG GAGTCTGTAG TGGCAAAAGC 300 

AATTACTGAG CACAAAAGCC AGTCCTCAAG GGCTGATTCC ACCTTCCCTG TCCAGGGACT 360 

TTCTCAGCAA ACTTTGTTCA TGAGCAGTTG TTCGCTTTGA TGGTCTTAGC CAGTTTTTGG 420 

TGCAGGGGTG TTCCTCTGGT ACTAGGGCTA GGGCAGCTGT TTAAAG 466 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 491 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

GACACCCCTC CTGCCATGAA GAATGCCACT AGCTCTAAGC AGCTCCCACT GGAACGAGAG 60 

AGCCCCTCAG GGCAGG1GGG GCCTAGGCCA GCCCCCCCGC AGGAAGAGTC CCCTTCCTCT 120 

GAAGCAAAGA GCAGAGGACC CACCCCACCA GCCATGGGCC CACGGGATGC CAGACCTCCT ISO 

CGAAGGAGCA GCCAGCCATC TCCAACAGCA GTGCCAGCCT CCGACAGCCC TCCCACCAAG 240 

CAAGAGGTGA AGAAGGCAGG AGAGAGACAC AAGCTGGCAA AGGAGCGGCG AGAAGAGCGT 300 

GCCAAGTACC TGGCGGCCAA GGAAGGCAGT GTGGCTGGGA AGGAGGAGAA AGGCCAAGGT 360 

GCTGCGGGAG GAAGCAAGCT CCATGGAGCG CCGCTGCCGG TTTTAGGGAG CAAACGTCTT 420 

AAAGCCGAGC AACGCCGTTC AAGCCTTGGA GGAACGGCTA GCGGAAGAAG TTTGTGGAAA 480 

ACAAGGGGCG T u91 
(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 47 8 base pairs 

(B ) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 62: 
ATCATTGAGT ACGCAGAGCT CAAAACAGAC GTGTTCCAGA GCCTGAGGGA AGTGGGCAAT 60 
GCATCCTCTT CTGCCTCCTC ATAG AG C AAG CTCTGTCTCA GGAGGAGGTC TGCGATTTGC 120 
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tccatgosga cctticcaaa acatcttgc: 7agagtc7ac atcaaagagg gggagcgoct iec 

ggagc7ccgg a7gaaacgtc 7ggaagccaa g7a7gccccg g7gcacg7gg 7ccc7c7ga7 2-0 

cgagcggc7g gggaccc7ca gcaaatcgcc attgctcgcg agggtgacct cc7gaccaag 300 

gag0ggg7g7 c7g7ggc7g7 cca7gt7cga ggtcatcctg acccga77gg gag07acg77 360 

caggacccat czgggggggg caccgccacc aatgcgtatg acgtcgatga g77777gag7 ^20 

70ac7gg7g7 gag0gca7ga g70g7g7a07 gaa7cc7gtg gacaacgg77 aag77aca -7s 
(2) information for seq 10 no: 63: 
(i) sequence ckarac7eris7ics : 

-;Ai LENG7K: 153 base cairs 

= 3 : TYPE : nucleic acid 

Z S7RANDEDNESS : double 

0 . TOPOLOGY: linear 

(xi: SEQUENCE DESCRIPTION' SEQ CO NO: 63: 

u - ^ ^ AAA^ i l- G G G T G G G C 0 AG G G G G 0 C A G G C C C AG 0 A.T G OA C 0 C C C A.7 7TTTTTGGGG 6 0 

^ - - - i Gll'woAl-L.o TGCTGAYACC CGGGGCCACA GCGTOA.GGOC GTTGGC-GGTG 120 

^ ~. ^ A. o o A G A G C A G G ^ G A G A G A G 0 OTr OA GGAGZ Z A. 0 AA T T G G G 0 A G A 0 A. G AA G 1£0 

^ ^ ^ p ~ 

2 ; INFCRXACION PGR S EQ 10 N 0*6-: 

(i; SEQUENCE CPLAPOACTEEISTICS : 

(A- LENGTH : 216 base pairs 
. i?~ : nucleic acid 



STRANOEDNESS : dc 



0; TOPOLOGY: linear 



- -~- - * - J"- w ^ w -w.- w ^ ^ _ ^ ~. t_ - w — » ^ .—. U ^ , 



:r: sti :s : 

rase rail's 
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ATCTGGTCTA GAGAGGCGAC TCCAAGCTCT CTTGCTGGCT CCCAGCTGTG GGAATCCTTT 60 

AGGCTTGTTC TCAACCTACA CGTTAAAAAT GCTTCTTGGT GTGTTTGGGG AGGGGGAGAG 120 

GGAAACTGAG CTCTCTCTTG ACCTCCTCCA ACACCCTTGA CTTGCTTACC CAGCCATTTT 180 

CAGTAGCTAC ACGGGTGGTC ACAGAACACT GGGCGGCACT CGGCACACAA CACAGAACCG 240 

GGGCAGTCCA TGCAGGTGCG GGAACACATG TCGGACCCAG GGAGCAAGGA ACACGCCACC 300 

CCGAGGAACA TGCAAACGGA GGAAGGATTC CCTTCAGATT CCAAGGATGC CACAACCCCG 360 

ACGGGCGGCT TAGGGAGGCA CCGATTATCT AAGGAAAAAG GCCACTGTTT G 411 
(2) INFORMATION FOR SEQ ID NO: 67: 

(I) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 413 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ. ID NO: 67: 

CTGCTCCTTA TGTTTTTATT TCCAAAGTTT AGAATTTCTT TGCTTCATAG TATTATTTTA 60 

TTTTACTAAA TTACAGAGTA AGAAAAGCTT TTCATTTTAT CTGATTTTAT TCTTAGAACA 120 

AAAATATTAC GATCTTCTAT ATTTTTGTTC TTTTGCCAAA AAGTGTAGGC AATTTTACAT 180 

CATCTTTTTT CCC AATCAGT TTGTGATCCA ACTATAAAAA GGAGACATAG AATACTGAAT 240 

AAATGAAACA GAAACTCCAA GGCCAAGAAG TGTCCATCTT GAAAGAGTGT TAGTGGCAAG 300 

ATATGTGACT GCAGACTAGA TGTAGACAAA CCTGAGAAAA ACCAAGCATG GGGGAAAGGA 360 

TYCCTATTTT AA T AAAT G G T GCTGGGGAAA ACTGGCTAGC CATATGTACT TTA 413 
(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 372 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
GCACGGTTAA AAGACCAACG TGTGTGGNTC AAATATAAAG GCCACACCTT TCAGACCGAA 60 
CCTACTCAAA GATCCTTTAC TTTGCAATAA TTTGAACTGG AG AA CC AAA G ACGGGAGACG 120 
AATGAAAGCA AAGATGCTCA AA G AA C C AAA GGAAAGACCT GAAGGAATCC ACCTGCATAG 180 
GCCACGCGTT CCACTCTGGG TCAAATGCTT CCACGATGCA GAAACCTTTT T TT AAAAAAG 
TGCAAGTCTA ATTACCTACC AAG G G T AAT A AAAA G C A C A G CACAGGAATG ATTACAGCTG 
ATGGTCAAAA AACAAACCAA AA C C AT T AAA AAAA C AA T C A G G C A G AAAA C A G G A G T T AAA 3 60 

TG77TACATA TG 372 



240 
300 



WO 93/00353 



PCT/US92/05222 



-105- 

INFORMATION FOR SEQ ID NO: 69: 

(is SEQUENCE CHARACTERISTICS: 

■ A > LENGTH: 389 base pairs 
iB"i TYPE: nucleic acid 
■■ C - STRANDEDNESS : double 
■;D; TOPOLOGY: linear 

(xi; SEQUENCE DESCRIPTION: SEQ ID NO:69: 

TCTAGAACCT GGACCCACCC AGCGCGTCCT TTCTTATCCC CGAGTGGATG GATGGATGGA 50 

TGGATGOTAG GGATGTTAAT AATTTTAGTG GAACAAAGCC T G T G AAAT G A TTGTAOATAG 120 

33TTAATTTA TTGTAACGAA TGGCTAGTTT TT ATT CTCGT CAAGCCACAA AACCAGTTCA ISO 

TGCTTAA3CN TTTTTTOCTT TCCTTTCTTT GCTTTTCTTT OTCTCOTCTC ATACTTTCTC 2-0 

-.v^C^_l. _ ^ AATTTTC TTGTGAGATA ATATTCTAAG AOG OT ITAGA AACAAGAAAT 3 00 

ACTCAGTAGT GGATGGGTTT CCCACTTCTC CTC«„ATCCGT TGCAAGAAA.C AA T T A C T A T G 3 60 

GTGCOOTAAT GCACACAAAT AGCTAAGGG 339 

■'2 ) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS; 

!A; LENGTH: 329 base pairs 
(3) TYPE: nucleic acid 
\Z) STRANDEDNESS: dcuble 
TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEq ID NO:": 

o AAAAAA . G G G AGGG 0 AGOG A T G T A T T AA T T G T A C A T C 0 A A G G AAA 0 T G T 

w - *~ w o ^ ^ ^ A-^_^" ^ o A w ~ G ^ G * AA. w A _ ^ ^ G _ - - 

*~l - -~ * - ^ ~_ ~ ^ ^ ~ 

- — - ~— ~- ^ -*■*- — ~ ~ — G A. ~ - ^ _ AA ^ w A. G A. ~ A. v ^ A. 0 _ ^ A. 0 A. G 

- . ^- _ .-. .-. A - AAA. ^ ^ 0 T C T G A. 0 0 G A. 0- Z CT3CGNGC2C 2 1 AG T G G A G G 3 0 T 0 0- A. 0 3 TT 

- — ^ - - ^ • ' - A* v w A. ^ G A. G A, 0 3 0 0 T 0 0 T 
, 2 ' I N F 3F0UAT 1 0 N" F 2 P, S E 0 " 0' N " - 2 ' 



- ~ ~ - ~ w* ^ ^ 



. ^. O ^ O 



l-~A.„^_-a- CACOT0O0AG t^c.cc.Oua AAA^AG^A 



INGE CHARACTERISTICS : 
LENGTH: -IS base pairs 
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STRAAE2NE33 ■ dcuble 
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AGAATGCTTG TGGTGCCCTT CGAAACCTCG TTTTTGGCAA GTCTACAGAT GAAAATAAAA 
TAGCAATGAA GAATGTTGGT GGGGATACCT GCCTTGTTGC GGCTGTTGAG AAAAATCTAT 
TTGATGCAGA AGTAAGGGAG CTTGTTACAG GAGTCTTTGG AATTATCCCT CATGTGATGC 

CTGTAAAAAT GACATTCATT CGAGATGCTC TCTCAACCTT AACAAACACT GTGATTGT 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 336 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
CTGAATTTTT ATATGCTTCA CTTAGGCTTT CATTTGAGTA GACTCTAAAA ATTCTGCCTT 
GCTTAAGTNC TAACACTGCC TCTCAGATTT CAGTTTTGGA CATTGCACAA CTAAGACCTT 
TTAAACGCAT TTNCTTGCTA ACTCGGAAGA CACATAGTCT GCAGCAAGAC ATTCCTATAT 
TGAAGAAATG AGAGAAAATT TTATGCTGCA TCAGGTGGAG AGCAAGGCTC AACGGTGGTT 
GCATTAGTTC CCTCGGAAGT ATTGAAAAAN CTTTGAAATG GGAAGGAAAA TTTTTTGCAC 

CTAATGTTCC TGAGGTACCC AGAATGTCTG GGGGTT 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 402 base _ pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
GTGCTCAGTA AATACAAATT GGATGGACTA GAGAGATAGC CCCGAGGACA CTGCCAAATA 
AATAACAAAT TGTGCAAGCA GCAGGCCGCT GTAATTAGAC CAAGGAGGAC agtcagttat 
TAATATCAGA CACGTGGCAG GGTTAACAGC CACTGAGGGT GGGTACAATG AAGAGAGTCA 
ctttctgcac CCTCAGGGAC TTCCCTTGTG ATGGCCTTCT AAAGAGGGCT G.AACAGCACC 
AAGTGCCCTC GCTGCCTCTG GTTCCTGCTG CCCTCCGCGT GCCTTGGGTG CCCCACAACT 
AGGGCCCTGG GTCCCTCCCA TGTCCCCCTC CCTCCTACAA CCCCTCAGCC CCTTATCTGG 
CCAGCCATTA TGA.TGCCTAT CAGTATGAGG C GAG AT GAGA GT 
(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 454 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

GGAGCCCGGG CCCGCGATGT GGCCCAGTAC CTG77GTCAG ACAG 7 7TOT7 CGTGTGGGTT 62 

C7AGTAAA7A CCGC77GCTG 7GT777GA7G 77GG7GGC7A AGCTCATCCA GTGTATTGTG 12 7 

TTTGGCCCT 7 TTCGAGTGAG TGAGAGACAG CA7CTGAAAG ACA2CATTTTG GAATTTTAT7 ISO 

TTCTACAAGT TCATTTTCAT CTTTGGTGTG CTGAATGTC7 AGACAGTGGA AGAGGTGGTC 240 

ATGTGGTGCC 77TGG7TTG7 CGGA7TTGT7 7TTCTGCACC TGATGGTT7A GCTCTGCAAG 

GNTCGA7TTG A.ATATCTTTC C7TCTCGNC 7 ACCACGGCGA TCAG7AGCCA CGGGTCGAGT 

■^'^.c wC^.G . .iGcjl : . A-.GCTGCTTT 7CC7GC7G7G G A 777 G7GGC CGT77G77CA 417 

^ - A 7 7 ^ l- ~ _ A C A. 7 7 A 7 G G*AA 7 7 7A G A. 7 7 7 G G C 77 4 5 ~ 

;i; INFORMATION FOR SEQ ID NO: 76: 

;iy SEQUENCE CHARACTERISTICS : 

1 A. ; TEN G i H : 313 base pairs 
' 5 i TYPE: nucleic acid 
iC! STRANDEDNESS : double 
(7 TOPCLjGY: linear 

(>:i) SEGUEN7E DESCRIPTION: SEC ID NO: 76: 

^ ' ^' ~* - ° - - — - - ^ ^ AU A G T G 7 7 G N T T A. T T A^AA. T AA. T 7 7 A C 7 7 N 7 T T 7 7 C 7 A. 77 z 2 

l^^AouA; A^ A-CTA^AA. TT 77NG7AGCCC TGGGTCTGTI TCTGGAOTCT 120 

^ v_ ^ ^- ^ u ^ o j. _ ^ ^ t - o ^ w _ . ^ ^ w o A o ^ ^ A i ^ A. T G A. G T G A. GOT AA.T G GG G G GG T G A. G AO*. TOOT 1 £ 7 

- ^ w- GGA-AT 7T G7CA3GN7CA CCCCNGAGCA GTCCACCCCN CAOiCTCATTA N7A7CCTT7A 2-7 

■V--A.-.- . ^-^-N - . G A772N7777A 7ACA777A77 07G77AAA7G CAC77TAGGA ACTGT7AAA7 3 77 

^ w ^ .AAA 7777 C AA. ZG~ 

'2 - N F 2 R71AT 1 C N" FIT 5 E ~ 12 N" 2 ^ ~ ■ 
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TCGTACAGAA ATGTCAGCTC CTGCAGCTTT GGTGCTCTTC TCGTGGTTCT TCGCTCTTTC 360 
AGCTTTCTCG TAGTCAAGCC TGAAGGCTTC TCTAAGCTCT AACTGGAGCT TCTGATTTAA 420 
GGTCTTTTGA GCTCATCAAA TGGTCT 446 
(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 296 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 

kGCGGGTGGC GCAATGGAGA GAATGTGCCT GAG AC AG AG C GCCTGGCTGG GGAGGAGGCA 60 

GCCCTGGG^G CCGAGCTCTG TGAGGAGACC CCTGTGAATG ACAACTCATC CATCGTGGTG 120 

CGCATCGCGC CCGAGGAGCG GCAGAAATAC GAGGAGGAGA TCCGCCGTCT CTATAAGCAG 180 

CTTNACGACA AGGATGATGA AATCAACCAA CAAAGCCAAC TCATAGAGNA GCTCAAGCAG 240 

CAAATNCTGG ACCAGGAAGA GCTGCTGGTG TNCACCCGAG GAGACAACGA GAAGGT 296 

(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 285 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 

CCTTTCCTGC CTGGGAAGTG ATGACTCGCA GGTCGGGCTT GCGGCTGGGG GCTCCAAGCT 60 

GGGTGCTGTG GGTAGGTGGG GGCGGAGACT TGGCAGGGAT GACCTTGTTT AGGCTGTTGC 120 

CATTGGCCAC AG G GAG GAG G CCAGGGGAAG CCCGAGCACT GACGTAGCCA TTCCCAACAG 180 

GGCTGGGGCA GGCTCCGTTA GCACTGTTCA GGTCACCNCC CAGCATGGCC CCCGCACTAG 240 

CTGGCCGGTG GGGCAGGCCA GGAGACACAC TGTTCCTCTG TAGTG 285 
(2) INFORMATION FOR SEQ ID NO: SO: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 402 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 80: 

ATGATTTCTT GCCTGTNATA ACCTATGCAC TCACAAAGAT GAACTCTCTG AGAGGGATGA 60 

GCAAGAGCTT CAGGAAATCC GAAAGTATTT CTCCTTTCCT GTATTCTTTT TCAAAGTGCC 120 

G AAA Z7GGGC T C G G A G AT AA TAGACTCCTC AACCAGGAGA ATGGAGAGCG AAAGATCACC 180 
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ATAAGACCAT TGGCAAAGGG AGAATTCATG AACTGAAAGA TCTGAAGTAA TTTCCCAGAA 60 

TGTAATGTTA AGAAATAAGT TAAAAGGCAG AGCATAATGA GTCTAACATG TGTGATTGAA 120 

GTCTTATAAG GMGAGAATTA AGAMCAGGCA AT ATT TT AAA GGRATAATGG AGAAAATGGA 180 

ATAATTGATG AAATATGTGA ATATATATAG GGACCATATG CATATGAMGG CCGGGGGTTA 240 

AATAAAACGA AATCTACTTG TACATACTTT ATGGGATTCC TGCAGCCCGG GGGGATCCAC 300 

TAGTTCTT 308 
(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 313 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(•xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 

CTTTAACTTA ATGGCAATTA AAACTCACTG GCAAAAAAAA TCACTAGAGA TGTCAGTCCA 60 

TTATCTTACC AAATAGTGTA TTTTTACCAT CTTTTACCTA CACCCTTGAG TAAGGTGGAA 120 

TAGGTTAAAG TTACTGGCAT AATAACACTT CATTGAATTC ATGATAGTAT TTAACATGTT 180 

AAAACTGTTT AGTTGAAAAG TTCACATGCA ATTTATAATT TAAAAATATG CT A CAT AT AT 240 

TTCATAAAAW T A C AAT A G G T CATACTARAC TTTGACTAAA ATTAAGAATG TKTTTCTKTC 300 

ATAATAATG C AGG 313 
(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 303 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 
TGCTCCGTTT ATTGCTCTAT TCAATGACCA CGAGCGAATT ATAAAAAGAC ACCAAATGTC 60 
TCTGTCTGCC GTGGGATAAA TATTTAAAGT C AG C AAT AAA GTCACGTGGC TCCAAGRTAA 120 
TACATGTTGC CAAAGAGTCA TGCATGCCCT CCTGATGGGC TCTCAACACA CGTATGGWCA ISO 
TGGGAACACA CGCAGAGCAA CACGCAGTAT GAACTTSTGG GAAGGCTTTA CCACAGTGAC 240 
A C AG T AAAAT GTCTCACGTA GATCTGRGCT GAGTCCCCAC CCAAACCTTG AGCTCCCCTT 3 00 

CCA 

(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 3 80 base pairs 
(5) TYPE : nucleic acid 
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CC) STRAGNDEDNESS : double 
(D) TOPOLOGY: linear 

rxi) SEQUENCE DESCRIPTION: SEQ ID ND:S£ 

AAAAlAAACC AGCTTTAATA CCAATATAGT TCTCTCTTAA 

AAT GCAGGGG CAGGCTCTTG G C A G AAA G A G T A G AAA G G AA 

GATGGCCCAG GZZZAGGGZZ CCTGCCTTGG GCACTAGGGA 

GGGAGTG AC A CCAGCTCCCC CTGGTCCAGT TATTGCAGAG 

TICCCAGGCC TGAAACATTT CTCAGGATTA CTTCTGACCT 

GZZ7GGGZZZ CTCTGGTCTA GGP~GGGZCZ CTTTG CCCAA 

A - ^ ^ - ^ Lr G u'-'GG GA'c C C C 

C: INFORMATION FOR SEQ ID NO:S7: 

;i; SZZ^Z^ZZ CHARACTERISTICS: 

■A> LENGTH: 2E0 base -airs 
: 5 ) TYPE: nucleic acid 
. C; STrLANDEDNESS : double 
(D; TOPOLOGY; linear 

>: i SEOYENCE DESCRIPTION: SEQ ID NO: £7: 

- r-- .n_--u.~. ^ A. * G G ^ A-AGAGATGTC 

-LA A G vj A C J.CTGCTGCCA GZZTTGGGGC 

^.~*A.~ *. L ^ v-AA GCATGCTCCT 



- — J~».~* -\S ^ o ^ . 



— -*"* ^' ^ ~— w ~. S_T ~. ^ ^ ^ ^ - w ' ~„ V 



■ ur.L- o i. A ^ C T L- •' 



i ' S E : 



LN PGR SEQ ID NO : £ £ : 

-N - - - ruA_r A. _ ■ _ z, R S T I C S : 
LENGTH : L., 6 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
x 0 ? w L0 G Y : linear 



ATACCGTGTT TCCCAGGACA 
ATGTGGAACA AAA T G G AAT G 
C7GGGGTGCZ TCG LjGG ATG G 
G C G T 'ZGGGGG CT C C Z CI C Z Z 
TCAGCCCCAG CA CAGG 
*~*A l-ujCui .*, ^ A* uC x AA* G G C u 



nGG r.G .* . . ^ C ^ w ^- » o A* G 

A. *, A* o ^ ^ G w A. A* 1 1 A* G L 0 ^ G _ 

Lr ^ rA. ~ ALLAA. G A* C * - w ^ ~ o 

^ -._.*-*. ^ -M vjr o ^ w ^ ^ o \_: ^: n 



60 



30: 

2 £ C 

- C r 
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CCCGTCCCAC AGTGACCCTT CCCATACTTC TGGGGGGGCT GCTCTCCATC TGGATCGTAG 420 
GAGGATATAG GTGTGTTCTG G AC CAT 446 
(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 384 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 

GTCCCTTCTG GGGACTCTRT TTCCCCATTT ATTGCTGCTG TGTCCCTNAC CAGTTCCTTG 60 

CAGGATTCCC TCCTTTTAAA ATGCCCTTAA ATCTAGCTTT GCCTTGGAGA CCCCAGTGGG 120 

TGCTGCTCCT GCCGTTTTCT TCCTGCCAAG CCTGAATCAA TGTTTCATCT CCAACCCTCT 180 

GCCAGTTTGG CCCCTCAAAG CTTGGTGGCT CAAGACTGTW AGCCTGGCAG AGCCGCGNGG 240 

TGA-AGGGAGA AGCTCTTGGA GCAGGCAGGA TGCCACCGCT GCTTCAGCTT GCCTCCTCGC 300 

CCAGCTACCC TTTGGCCCCA TTGGGCCCTC GTMTGCCTCT CCAGGATTGT ATGTTTCAAG 360 

NCTTGTCCTG TGTTCCTTTG TCTG 38A 

(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 344 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 

TCAAGCTGGA AAGGGCTACT ACCTCATGCT GGAAAGGGCT ACTACCTCAA GCTGGAAAGG 60 

GCTACTACCT CAAGCTGGAA AGGGCTACTA CCTCAAGCTG GAAAGGGCTA CTACCTCAAG 120 

CTGGAAAGAG CTACTACCTC AAGCTGGAAA GGGCTACTAC CTCATGCTGG AAAGGGCTAC 180 

TACCTCAAGC TGGAAAGAGC TACTACCTCA AGCTGGAAAG GGCTACTACC TCAAGCTGGA 240 

AAGGGCTACT ACCTCAAGCT GGAAAGAGCT ACTACCTCCA AGCTGGAAAG GGCTACTACC 300 

TCATGCTGGG AAAGGGCTAC TACCTCAAGC TGGACAGGGC TACT 344 

(2) INFORKATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 364 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 
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.oAcGGT GAGGGCTATG AGGGGTCAGG GGCGA.CGTT2 CCCAGGACGG TA.GTGCTT 



V-J — * 



50 

;ct GGTGCTAAAT AAAAGTGAAT AAATACTAAA TAAATACAAC 7GGGGCZCAG 120 
:TG CCTTCCCCCT CCCTCCTGTG ACCCGCAGCA GAGGGGGCkG TTTAGATGGA ISO 
GGGCTGTCTG TCAGCCCCTT CCATCCACTA ACCCATCACT GCCTCCCAGG GCAGGAAACC 240 
AGGGCAG SGC CAGGCTGCGC ATTAGGGGAG AGAGGAGGGG CAGGTCTCAC GCCCACAGCC 200 
:GCAGT TGAGTCTTAG CATGAGGCAG CAACAGAAG2 TCTCTCTTC 2 TCCCAGCTAA 260 



(2; INFORMATION FOR SE0 ID NO: 92: 

Ci: SE0YENGE CHARACTERISTICS : 

•.A: LENGTH: 215 base -pairs 
: 3 - TYPE: nucleic acid 
■.C- STRA2NDEDNESS : double 
0 " TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 

ATTTAATAGA AAATTAAAAT AA T AAA T AA T A T G AAA C A G A CTGATAACGO TGAG2TGGG2 60 

„-L-~ -Ao.^TAGTA C.AAAGTTAAG GAGGTAGGGA GG AT GGTGGG GAGGAGGGGG 020 

0GGA0CA000 IG0AGGA0G2 GGGAGGCTGC TCAGACTGTO GTGATGTCAG GAAGGG 0CG 2 ISO 

~ ^ * - w w ^ o G A C A T G C A. C T AAJLAAA A. G A. G A-AA G 215 

:2) :NFOR>LATION FOR SEQ ID NO: 53: 

(i; SE0YEN2E CHARACTERISTICS: 

^A) LENGTH: 264 base pairs 
'2; TYPE: nucleic acid 
sC; STRAOOEDNESS : double 
O; TOPOLOGY: linear 

^ vrAA^AAA^AA T0GG22T2G2 AGTG 2 2 OTG0 AGA^AGGAGOT 02A0A02AT0 61 
^- „ " ~'~ * ~ ' ~ ^ w ^ _ ~. „ - -> ~ - - - - ^ AG ^ A C A. Z A 1 2 TTA. CTACA.C 2 A^A 0 0 2 G 2T 2 1 2 1 

" ~ ~ ~~~ ~" - - - ^ - - ■-- ----- - - ^- ~ uAc 2^02212 A. OA. TT G T G 2 A. 2 AA. 2 T G T A. GT2TG 151 



■. - w - - - ^ A. ~ AA. o G AA. 2 G T 2 A _ O.^A - 22^02 2- G 2 A. G A. - ; 



. _ - v \_r _~ „ 



w - - : c s c a i r s 

-:l:i: acid 



PCT/US92/05222 



WO 93/00353 



-114- 



(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



W SEQUEHGE DESCRIPTION: SEQ 10 
GTTCATACTA GAAGTGTCTG CCAIGTTM TTGTTTGTTI 
AGTATTTCTT TTTTAAAAAT GATTATTATA CTTTAAGTTC 
TGCACGTTTG TTACATAAGT ATACACGTGG CATGGTGGTT 
GATGTACATT AGGIAITICI CCTAATGCTA TGCCTCGCCT 
GCTCCAGTGT GTGATGTTCG CCTCCCTGTC TCGATGTGTT 
TATGAGTGAG GGACATGCAG tgtttgattt tctgttcctg 
tggcttccag attcatccat gtgcttggaa aggcatgaac 

TAG 

(2) INFORMATION FOR SEQ ID NO:95: 

r _- x SEQUENCE CHARACTERISTICS: 
U) (A) LENGTH *. *05 base pairs 
fSI TYPE: nucleic acid 

(C) STRANDEDNESS: douole 

(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 

AACAGCCCCC GATCTGCATA GCCTGTGAAA GCCCACGGGG 

CCCCATCCA ATGCCATTAC TGTNAAGTGA GACTTGGCCA 

AGGAGCTCTT CAGAAAGGCA CATGAGGACC ACGGTTTGCC 

.GGTCTGGAG TGCCCCTGCA AAGGGTATTG ATGGACTTCC 

T'TTGCAAAC AATTCTCTCA GTTACGTTCA gcacttaaga 

CTTTAGCAAC TTTTTCACAT CATAGAAGGT GCAATCGCTC 

GTGACTTCTC TTTTAAAATT GAGTAGCAGA tgaaaaatta 

(2) INFORMATION FOR SEQ ID NO: 96: 

a\ SEQUENCE CHARACTERISTICS: 
U) (A) LENGTH: 173 base pairs 

m TYPE: nucleic acia 
C STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(x <) SEQUENCE DESCRIPTION: SEQ ID NO:96: 
G^GACAATA CTGATGCCAG CTCTTTGTAA TTGTGAAATC 
^ATCTCCA GTTGTCTACT GTAAAlACTG GAATTACAGC 
~r> r^TCTG TATTGTACAA GCACTATTCT AGATATTAAA 



TCAGGAAAAT 
TGGGATACAT 
TGCTGCACCC 
AGCCCCCCAC 
CTCATTGTTC 
TGTTACTTTG 
TCATCCTTTT 



TGGAGAGAAA 
GTGCAGAACG 
AICAACCCGT 
CCTCCAACAG 
AACTCCCACT 
CTGAGAATGA 
TATGGCTGCA 



ACATCAGTAA 
CTGTAGCCTG 
TCAGTTTCTG 
TGCCAGTGAC 
ACGGCTAATG 
ACTTGGGAAC 
AAATT 



CCTTCTGCAG 

GGCCTGCTGC 

GTAAAACACA 

AGAGCATGTC 

NCAATAGGAT 

ACTACTGAGA 



60 
120 
180 
240 
300 
360 
420 
423 



60 
120 
180 
240 
300 
360 
405 



TGTACCCAAA CCTCTGGATT 

aaag g at at g GGGACTGGGC 
gaaatttaac cgc 



60 
120 
17 3 
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(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 100 : 

AAAATGCTCA CAGTGGTCTT CTCTGGCCGG TGAGCCTACA GCTGATCTTG TCAGAGACAA 60 

ACGTTAGTTT TACTGAGTCA CCCAGAGCCC TGTGCTGGTG CCTGAGGGTT TGTTCCATGG 120 

GACAGTCTCC ACAATTCCTC TGGGGAAGGG CCACAAATCC CACAGTGTGT CCCAAGAGGG 180 

CTGGAGTAGG CGGAGTCCCC AGCAGCTGTG GCATGACCAG CCATCTCTCT CAAAACAATT 240 

GTTAACAAGC CTTCTGCAAG TTAAGGTTCC ACATGGTAGC CGTGGTACAG AGGCATTTCT 300 

CTAGGGTGGG AGAGGCTTGT GCTCTACACC AGG 333 

(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 156 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 
CTCTGACTTT CCTGTGGNTT TAGAGCCAAG CTCAAGGTAG TAGGCCGTAG GGNCTTATTT 60 
T ATTTT C AAA CCCCCATCCT CAGAGCGCAG ATACATGCAG AGGCTTCTGC CAGGCTACCA 120 
CGGGGCCTTA GTGGGAACAG GTTGAGACCA GCACTT 156 
(2) INFORMATION FOR SEQ ID NO: 102: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 331 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 

CGAAAAGGGG NNNTATGGCC ATCTTTTATC AGAAAAAGTG ACAAAACGGG AATTTAAAAA 60 

ATGAATTTTC NNTCTGACTT TATTTNNAAA TACACTTTCT TTT T TNN AAA ACCAATACAC 120 

TTTCTTTGAG GATGACAGTA TTAGGAAATC C AAT T NN AC A AAAAATACTA CATCTAGTCT ISO 

GGGGTAG AT A TAT TT ATTTT TGGTAACATA CATTAAGTGG CACTAATTAC ACAGTAACTA 240 

TAAGGTAACT AA C AT G AAA C CACAGAACTG TAACTCTGCC ACAGCTGCAT GAACTTGGGC 300 

TTTTCTGGTT GAGCCCATTT TCAAAAAACT G 331 
(2) INFORMATION FOR SEQ ID NO: 10 3: 
(i) SEQUENCE CHARACTERISTICS: 



WO 93/00353 A A PCT/US92/05222 



-118- 

CATAACAATA ACAATAATGA CATCTTACAA CTTACTGCCA CCACCAAGCT TGCTG 355 

(2) INFORMATION FOR SEQ ID NO: 106: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 355 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 

GGATGAGGTC GCCGGGATCG TGGCTGCACG CCACTGCAAG ACCAACATCG TCACAGCTTC 60 

CGTGGACGCC ATTAATTTTC ATGACAAGAT CAGAAAAGGC TGCGTCATCA CCATCTCGGG 120 

ACGCATGACC TTCACGAGCA ATAAGTCCAT GGAGATCGAG GTGTTGGTGG ACGCCGACGG 180 

TGTTGTGGAC AGCTCTCAGA AGCGNTACCG GGCCGCCAGT GCCTTCTTCA CCTACGTGTC 240 

GCTGAGCCAG GAAGGCAGGT CGCTGCCTGT GCCCCAGNTG GTGCCCGAGA CCGAGGACGA 300 

GAAGAAGCGC TTTTAGGAAG GCAAAGGGCG GTACCTGCAG ATGAAGGCGA GGGAG 355 
(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 

GTGTCTCTTT TAAAGAAAAC ATACTTTATT TTGGTCTAAA TTGTGAAAAT ACCCAAAACA 60 

TTTGATAGAA ATTGAACTCT GTCAACAGTG TTATTTATAC TAAGATCAGG ACAGTTCCTT 120 

GAGATCATAC TGTTTTATTA CTAAGTTTGG CCTTTGTTTT A C AAAT G T AA TGTTCATATT 180 

TATTTGAATT TTAAGATTGG TTAAATGTTA ATGAAAAGCA ATCCAATTGT TANTTTTTAG 240 

TAGTGCCTTT TCTCTGTATG CCTTAATTTT ATT 273 
(2) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 359 base pairs 

(B ) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: 

ATTTTATTTC CTTACATCGA AGAAAATGTT AAAGAGTATC TGCAGACACA TTGGGAAGAA 60 

CAGGAGTGCC AGCAGGATGT CAGTCTTTTG A G G AAA CAGG CTGAAGAGGA CGCCCACCTG 120 

GATGGGGCTG TTCCTATCCC TGCAGCATCT GGGAA7GGAG TGGATGATCT GCAACAGATG ISO 
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AlCLAGGCCG TGGTAGATAA TGTGTGCTGG CAGATGTGCC TGGNTCGAAA GACCACTGCA 240 
CTCAAACAGC TGGAGGGCCA CATGTGGAGG GCGGCATTCA CAGCTGGGCG C A T G AAA G C A. 3CC 
GAGTTCTTTG CAGA7G7AG7 TCCAGCAGTC AGG7AAG7GG AGAGAGGCCG GGA7GAAGG 3 59 

(2) INFORMATION FOR SEQ ID NO : 109 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 360 base pairs 

(B) TYPE: nucleic acid 

(C) STPANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 

T7TATNAAAG CAGTTAAACT T A G C A T 7 AAA 7AACACTCTT TAAATGGTAC ACCTATGAAG 6 0 

CAAGAGTTAA ATATAAAIOC AGTCTAATCC TGTACACTTG TGATTAATTG TGACAATCTT 120 

AAG7TGCTCA CTTC7TT:0C ATTTACCAAT 7CAGAGAAAG CCCGTTTCC7 GT7TTCTCCT ISO 

CACCACTT7G CGTTGGCA7C ACACCAACCC TGCC7CGGGC 77CAGC7GCA GA7CC7CCCC 

rtu-woCL.ww. C^ZAGCTGGG CTGACTCCAG 7CCCAGCZCC AG7CTCCACC AACTGAGCAG 

■ CAGG GTTGTG7CTG GCTTCCAGCA 7CTACCAACC CTTCAGAGCA ACTTCCAACA 



INFORMATION FOR SEQ ID N0:UC; 

(i) SEQUENCE CHARACTERISTICS : 

A } LEN G TH : 364 base c a i r s 
i'5) TYPE -nucleic acid 
(C; S TRA2 '2 EDN E S S : double 



Tvrr 7-TC' 



'RIPTI0N: SE~ ID N0:li; 



- * ^.-.^.-.l-^-^ o~_^l.-l^-G T0AT_ lAA.Go GGGACTTOTA G- 



.COAAGGTIC OOACGGOAAG G OTGTTGGGT G ITGGCAGCA 



A.A,.G.ACA. AGCAG..G.: IGAGC.CAIA G IAG7GAC 07 CAGATCTC OA GIAGOAA 



'o c ^ 



V? ^ v." ^_ ^ ^ ^ _ 



— - - -~- - i «^ o w ^ A ^ A ^- w A 0 _ A_A G 0 ^- ^ 0 _ 



- ^ - - — — • - - - - - ^ _- ~* - 



240 
3 00 
360 



WO 93/00353 



PCI7US92/05222 



-120- 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 

TTTTTTTTTT TATATTTTAA ATGGAATTTA TTCTATCAAC TGCCTGAGAG GACACAATGG 60 

GGGAGGGGCT TCGGACCACA GCAGGAGCCC CGACTGCCCA CCTGAGGGCA GGGAGAGCCT 120 

GACCCCATTG GCCCAGGCCC TGGCTCTGTA ACCATTAACC TCTTCCCCCA ACTAACACCA 180 

ATGAAAACAC CATTCCACGT GACTGGGCTG TGTGTTTGCC TCTGTGACAT GGGGACCCCT 240 

GACCCTAGGG GTCTCGCCTG AGCCAGACCT GAGGGACCCA CCCGCGTAGG ATGGAGGAAG 300 

GTTTAGGCCT CCCTTTTGCC AGCCAACGCC GGGGGGTGGG GCAGACCCTG GGAGTGGGCC 360 

TTACAGACCA GCCACAGGTA TTTCTTAGGC AATTTGACAC ATTTTATTAC AAAACCAGTC 420 

TACATTCATT CCTAAAAGGG TCATTTTCAG TAAAA 455 
(2) INFORMATION FOR SEQ ID NO: 112: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 398 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112: 

CTGATCTGAC AGGAGGTGTA GGTCAGGCAG TAATGGAAGT SATGGGGAAC AGCTGTAAAT 60 

AC AG AT AAA G CTTTACTCAC TCGCCCACCC ACTGCTCATC TCCTGCTGTA CTGCCCAGTT 120 

CCTAACAGAC AGCAGACAGC TACTGGTCTG TSGCCCAAGG GTTGGGGACC CCTGACATAG 180 

ACTAAACAAT TCACAATGTT TAT ATT AAA C AACTTATTCC AAGTTTCCAT TTTAGACTCT 240 

GGAACATCTG ACATGGTGAA TCCACAGGTA GTAAATSGGA AGGGAGATAA CAGACAACTT 300 

GACGGCCGTG GAAGACGCAC TGGGCGGGCA CTGGTGACGG GTCTCGGGAC AGACTTCACA 360 

TCTCCAGACT GGCACAGTGG GCTCACACCT GCCTCCCA 398 
(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 444 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 

ATCAGTGTCA GTGTCTAACA GAAGGGTCTG TTAAGGATGC TTCTGATTTA A C C AAAA G AT 6 0 

TAAGCTTCAG AAACAATCTA A C AT A C T C AA AGGAGCACCA AATTATCAAC CGGCTACAAG 120 

GATGCAAAGG ACCTAAACAA CAGATGTCAA AGGGCTTGTA AAAACTGGAG CCAGCAACCA 180 

TTCCACTTGA AGGAATCCAT CTCAGGGAAA TGCTGGAATC CACACACAAA AGCAGGTGTG 240 

CAAATAATCA CTGCAGCACG CCTTCTAATA G T G AA C AA C A GAGGCAATCC AAATATCCTT 300 
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TTTGAAAATT TAGAGGATAT TTATTTCTCA GGAAGGTGCA CAACAGCTGG CAGGCACTGC 60 

TTTCCCTGCT CTAGGGGATT CCTCTCTCCT TTTCCAAGAA ATCCCCTCTC TTCTTAGAAG 120 

TGCCCATGGG AGGCTGGGAT GTGAAAAGAA ACCATACACA ACACTCCAGA GCCTTAAAAA ISO 

AATAAAGCAA CAACCTCCTC CACACGAATA CACTTACAAA ATAAATAGAC GGATAAAAGA 240 

GAGGCCACGT GCCTCCCATC CCGGCTGTAG GGCTGCTTGG GGATAGTGGG GCTGGGTGGC 300 

TCGGTCCCAC TTCTCCCAGC CAGGATGATC CAAAGGCTAA ATGGGATGGA AGGGCCCTGG 360 

CTTTCAGAGA GAGGGTGGGG CAGGCCTCTC CTGGTACTCA GCAGGGAGGA CACTGGGGCA 420 

CGGGTAGGGG TCCAAGGGCC ACTTAATA 448 

(2) INFORMATION FOR SEQ ID NO : 117 : 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 551 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 

GAGACGGAGG CTCGCTCTGT CCCCCAGGCT GGAGTGCAGT GGCGAGATCT CAGCTCACTG 60 

CAAGCTCCGC CTCCCGGGTT CACGCCATTC TCCTGCCTCA GCCTCCCGAG TAGCTGGGAG 120 

CCAGCGCGCC CAGCCTAAAA AACTTTTCAA GTCAATATTA CTACGATTTA ACATTAGAGT 180 

GTGGACATGT GATTTAATCG CTATAGCTAA AATACGTCAA ATATACGTTG TCATGTGCTT 240 

GAACATGATG CTAACCCTGA CAGGATGAAG GAAAGTAATA TTCTTTCAGT GTAGTTCAGG 300 

AGAGCATTTG TTTTCTTTTC TACCAATTAA CCCATCATTG CTTTTAAACA ACCATCTGAA 3 60 

GGAGCAGAGA GGCAGGGTAG AAGACAGAAG GGGGTCTATG TGGGTACTAA AGATGTTTCT 420 

GTTTTGTAAT ATTGTGTGTG TGTGGGTTTA TGGTTTGCTT AAGGGATCAA AACCTGGAAA 480 

AAAT G G G AT T CCAGGAATGG CTCTGTTATT TTTGCTGGGT TCCAGCTTGT AATGCCTACT 540 

GCCTTGGTTC A 551 
(2) INFORMATION FOR SEQ ID NO: 118: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 426 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 

CCCCACCCCA AAAT C AAAA C TGAAGGTAGT GTCAGTGTAT ATATGGNGTC CCTTGTGCTG 60 

AAA G T C AAA G CAGCTTCATT TTGGGGCCTC AAGAGCTCCA GCTCTGGGCT CTTCACCTCT 120 

AAGCCCATGG GGAGTGCCCG CCCAGTGGTG TGTATAGATC GGAGGCTGAG GGCCTCACCC 18 0 
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.iAw_u-A.;C 7G7 GGGG7GG 7GGGGAGGG7 G7GGAGGAGG G7A.GA.AG7AG GAJLAGTGCCA 
TGTGCATGGG AAGAAAAATG GAGGGTGGTT GGZAGZGCGG ATGGGGTCGA GGAGACGGAG 
GGAGGTTGGG GAGAGGGAGG TGAGTGGCAT TCCTGTAGGA AAGCAGCCCA GATCTTGGGG 
CCGIAACGGA 7G77C7GGAA G777TGAC77 TGAACCACCA GG7GGGA77G T7AACAAGCT 420 
7G77GA 

(2) INF0R>^CI10N FOR SEQ ID KG; 119: 

(:) SEQUENCE CHARACTERI STI CS : 

'A; LENGTH: 434 base pairs 
>H; TY?^ : nucleic acid 
i. C ) STRANDEDNESS : dcuble 
i.D.) TOPOLOGY: linear 

(xi; SEQUENCE DESCRIPTION: SEQ ID NO: 119: 

A^AAA. C ^ ^ T A ^ TT A G T T T T C A. G G G A^AA.T A.T A. A. G A. 7 G G A 7 G 7 6 0 

~— " - ^ * — ~* : - -~ ^ - - f+ ■>_■ r-„A CAi'^uu AA. G A. C A G A. G G 12 0 

^ ^ ^ _ v.- x ^ o j. o A. i AA--A G «wi A. L A. 2 C ^ GGAA. A. G 7 G G G A. G G G ISO 



- ^ w w * ^ — ~- ^ * r-^ ^ o A ^ A. x - A. AA-AA* C A— AA G C 2 C 

: ^ . o Go A. iAc'*^. . A. uA 2 0 A. C 0 AY 7 G A. 77 C 7 G G G 3 0 2 

l b-^A.^ ^ GA.G777AAi7G T G CTTTCTC C AG777C7C7G 2 6 C 

jL'ooGG AA.GGGGGTGC T2C7GACCCC A 2AGGG G CAC 42 G 
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(C) STRAND EDNESS : double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 
ATTTCTTTCC TTAATCATAT CTGATGCTGG GATGTGGGTA ACCCCAAACT GAAGGCAGCT 
GCTAAATCTC AAATGCTAAA AAAATACTGC AATTTTGACA TCAGTGAGTC AGATCAATAC 
ATCCTCTGGG GCTGATTTTG CTTCACAGTT AGGATGAGCC ATCTCTTAAG CTGCAGGCTC 
AAATGGGATT AACTGAACTC TAT AC CTGGG ATGGGCCATG GACTGAGCTG TCCATGCAGA 
AGGACCAGGC TGTCCATGCC TTCCCTGCCC TTTTACTCAC CACTGCACAG CAGCCCCAGT 
GGGCCTACTG CACATGTCTA GGAGAAATCA CTCTAAGAAA ACCAACAGGA ACAGGCTTTA 
GGCAACAAGA GACGTCTCAC TGCATCTCCT CCCACGTCAG AACTTGAGTA CTGGGTCTTT 
GCAGCTCAGA GCATTCCTCC CTTCCCTTTC CTGCCCGAAA GGCCTGCCTT TTCCTGAGAC 
ATATGGCACT CCATGCTGCA AGTTTCAAGC AGATGCAGGT TCTTATGGGG CTTTTTGCTC 
AAAGAGCTTT GGTT 

(2) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 238 base pairs 

(B ) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 
CACCTAAGCA GGTAGACATC CGCAAAGTCA GATGCTTTCC AACATGACAC CTGAACATCT 
TCCTTTATGC AACACCCAAA CATCTTGGCA TCCCCACCCC AGGAAGTGCG GGGAGGAGGT 
TATGATCCCT GGGCGCTICG GCAGAATGGA GAGCTGAGGT GTCCCTCCCC TGCTAGTCAC 
CTACCAGGTG TCTGAGCAGC TGCATGCTCC CTGGCTCAAG TGGGCACTGT ACCTTTTG 
(2) INFORMATION FOR SFQ ID NO: 123: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 244 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123: 
ATCCAGGCTT TCATTTCTAG CCAACCCTCA AACACCACCA ACTACAAAGA AAATTTAAAA 
GTCTAATTTG TAACCTTCAG ATAAGTATAA ATTAGTTTTT TCTAGGCTTT CATTATTTGG 
CTTCTTATAC AATCTATCTT GTAAAGTACA TTCCTCTAAA TTTACATTAT CTAAAATTAA 
GGCTAAGCAT TATTTAAATC ANTTAATCAT ACAATATTII ATGGCAATAT GCACATATTT 
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TCTTAGTTTG CTGTCGCGTC TGTTTT 266 
(2) INFORMATION FOR SEQ ID NO : 127 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 435 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 

GTCTGGTTCT ATTCATTTTG TAGTTGCGAG AAAAGGAATG AACCGTGACT ATGGCAATTC 60 

ACCGTGACGT GTGATAATTT AGTTTGCTAT GAGTTTTCAC TCTTAGGTAA AACCTAGTTA 120 

TCCTAATTAA TAATTAGTTA TGGATGATAT AGTAATTTTT TTTTTTTTTG ACTGCGTCTC 180 

ACTGTCATTC GGGCTGGAGT ACAGTGGCTG ATCACAGTTC GGTGCAGCCT CGACCTCCCT 240 

GGGCTCAGTG ATTCTCCTGC CTCAGCTTCC CAAGTGGCTG GGG ATTATGG GCATGCACCA 300 

TCAATGTCTG GCTAATGTTT GGTGTGTTTT TTTATAAAGC CAAGGGTTTT GCCCATGNTT 360 

CAAGACCCCG GGGCTGGTCC TTGAACCTCT TTGGGGCTTC AGGCAAGTCC TCCCACCTTC 420 

GGGCCTTCCC AAAGT 435 
(2) INFORMATION FOR SEQ ID NO: 128: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 471 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 

TTCCCTTCCC AAGGACTCGA CCTGAGAACC GCCATGTACT CGGAGATCCA GkGGGAGCGG 60 

GCAGACATTG GGGGCCTGAT GGCCCGGCCA GAATACAGAG AGTGGAATCC GGAGCTCATC 120 

AAGCCCAAGA AGCTGCTGAA CCCCGTGAAG GCCTCTCGGA GTCACCAGGA GCTCCACCGG 180 

GAGCTGCTCA T G AA C C AC AG AAGGGGCCTT GGTGTGGACA GCAAGCCAGA GCTGCAGCGT 240 

GTCCTAGAGC ACCGCCGGCG GAACCAGCTC ATCAAGAAGA AGAAGGAGGA GCTGGAAGCC 300 

AAAGCGGCTG CAGTGCCCCT TTGAGCAGGA GCTGCTGAGA CGGCAGCAGA GGCTGAACCA 360 

GCTGG AAAAA CCACCAGAGA AGGAAGAGGT TCACGCCCCC GAGTTTATTA AGTCAAGGGA A 20 

AACCTTCGGA GATTTCCACA CTGACCAGCG AGAGAGAGAG CTTTAGGGCC A -71 

(2) INFORMATION FOR SEQ ID NO: 129: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 186 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi; SEQTENOE DESCRIPTION : SEQ ID NO: 125: 
CCC77TAACA TCCTCTGCCA ATRACTGGCC 7CAAA7CACC AG7GGAACCT TTTCAAAAAA £0 
TAGACCATTG GCTC7ATG7A G77C7AG7GA TCTRAAA7A7 CCACG7G7GG CCCAGGAGCA 12 0 

CTGGCTCATG CC7G7.AATC 0 CAGCA7G77G GGAGAGCGAG GAAGGAGGA7 CA777RAGCC ISO 
CAGGA.G 

(2) INFORMATION FOR SEQ ID NO: 130: 

(i) S EOYENCE CHARACTERISTICS" 

1 A : LENGTH: 307 base pairs 
(3i TYPE: nucleic acid 
■ : C; STRAD;2EDNESS : double 
:.. D ;■ ^OPOLOGY: linear 

(xi; SEQUENCE DESCRIPTION: SEQ ID NO:120: 

A_ AAAA..A.GT TA.GGA_ATA.TA CCTAACCAAG AA G G T G AAAA ACCCCTCCAA G G AAAA. C T A T 

GAAAOACTGC TGAAAGAAAT CATAGA0TAC ACAAATACAT TTCATGCTOA AGGATGGGTA 

GAATCAATAT TGTGAAAATG GCCATACTGG CAAAAGGGAT CTVCAAATTC AA0GGTATCC 

^CATYAAAlA CCACEATCMT TOTTTACAGG NTTCGGAAAA GGAATTCTAA A_ATTGA.TA.TG 

'° w ~ " w w ^ 0 ^ G G 0 0 G G C A T A G 0 0 0 A T G G C 0 G G 0 T T A 2 S AA- W AA. G G G A 0 AAA- 

(2) INFORMATION FOR SEQ ID N 0:121: 

( A. ' TEN" GTH : IS-' base :a:rs 
TYPE : nucleic acid 
S - RAN SADNESS: do ub 1 e 
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AGAACTMTTC GMGNAGACAG CCAGGAGCAT TGAGAGCACC CTGGACGACC TCTTCCGGAA 
TTCAGACGTC AAGAAGGATT TCCGGAGTGT CCGCTTGCGG GACCTGGGGC CCGGCAAATC 
CTTCCGNNNC ATTGTGGATG TCCACTTTAA CCCCACCACA GCCTTCAGGG CACCCGACGT 240 

270 

GGCCCGGGCC CTGCTCCGGT AGATCCAGGT 
(2) INFORMATION FOR SEQ ID NO: 13 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 529 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 3: 
CTTGCAGTAC ATAGCATTGT TATTACTGAT AGCTTTATAA ATCTGCCAAA TAACATAGAA 60 
TGTAGCCTCA AAAGGATGGT CGAGGGTTCG CAATCTTTCT TTCTCCACCC AGTGGTGTGG 120 
AGCAACTCTG TGCCTTAAAG AGGGCACCAT GGAAAGAAAC AAAAA G G AAT CTCTTTCAAA 
ATGCTGGAAA TTAGGCTTAG CTCACTACTT TCAGGATAAA GACAACTGCA TCTAATTAAG 
TCCACTCCAC ATTTCTTTGG ACTCTAAGTA TTCTGCACCT GAAGGCTAAA TTGAACTGGC 
TCAGCCCTAT CTTTTTTGCC ACATCTTTAA TTACAAATCT ATTTCTTCTT CCTTTCATTT 
ACTTCTCTTC TCTTAAGTAA GAAATGTGGG AAAT G AG AC T GGCAGTTTGG TTTGTTTGCA 420 
TGTGGGTGTC CATTAGGCGT CTCATCCTAT GGCCCTTTTT GGAAATGTTG CCTTCCTACT 
ACACACCTGG GAGGTTTCCC CAAGGCTCAA CCTTTTTGCT TCAGGTAAA 
(2) INFORMATION FOR SEQ ID NO: 134: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 437 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:134: 
GACGGTGGCG ACGCGTGCAC CGGGGATGTG TCCTGCCACC AGAGGAGGTG TGCG7GGCGG 60 
GGAGCAGAGG GGCTTTGTTT CCCAGCTGAA GGTGCGGCTT CTTCACTCTT AGAGGTGCGT 120 
GTGTGGGTGG GGGTGCTTGC TGTTGAGGTT TATGCCTGTA ACTGACAGCT GTCCCCCAAG ISO 
CCATGCTGGC AGTGTGTAGG TGTCGTGCCG GCCACCGCAG AGGAATCCTC TGGGCTTCTG 240 
TGGTTCAAGT GGGGCCCAGC GCAGAGCTCC ATGAGTTGCT GAGCAGCCAG CCCTTCAGCA 
TCTCCTGGGT TTTGGCAGCA GGAGGCGTCC CCTTGTGCAA TTCAGGGGGC GGJGGGGGGT 
GGGGGCACTC GTAGCAAGGT AAAGGAGCCC CTGCTCAGGC CCTTGTTTGC TCCCCTTTCT 
TGCAAGAGGG GTAGACG 
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(2) INFORMATICS FOR SEQ ID NO: 125: 

(i) SEQVE.NCE CHARACTERISTICS: 

■ A ; LEN G TH : 534 base a i r s 
i'B; TYPE : nucleic acid 
t C STRAND EDNESS : double 
(E.-= TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125: 

GGCATTGTTC TGGTGGGTGT GTCACGCTCC CAGAAGACTG AATTTATGGI AGGATCACTC 60 

GCAAGGCCTT CTGAAGGACT CTTACCTAAA A C AAAA G AAA TATCAGGGAC TTTTGTTGAC 12 C 



YENOE DESCRIPTICN SE^ ID 



- s\ o ■. — ^ ^- v.- ■; — ^ . _ ^ ^- AAAAA. G G 0 AA 



~ -"- O »j: \_ * ^- ^ ~- vj- ^ ^ , 



• - _ n ^ w ^ . . . „ . _ . — _ A _ . ^ 



ISO 
210 

300 



'ACAAC TCAGTTTTAC ATTTAAATTC AGGCAGTGTT AATATGCCAA CGTAGGGAAT 
GTGCCTTTTT CAGAGTTGGC CAGGAGCTCC TGGCTGGGAC ACGGAGAGGC AGGTGTGGCG 
TAAGGCCTCA CZCCCGGCTG TGAAGG7CTC I3ATCACACA GAAGOAGCCC TGCCCAGCCT 
GGGTCATTTG C7GTCCGCTT TTCTCIGTGA C2ACAAGCAG CCCTGAACAA CCAGTATGTG 260 
u.Al-A.AG.G AA^AAAGGOTG T2CA.GA.TAAA CCCACCTAAG TGAAATGGGC -20 
CA.ICT0TAA A 2TGGGGTA.C CT0ACTGCA2 AGGTTCTAGG TAGGCTTTCC ACTTAATCTA ISO 
ACTTGAGG" TACAGGTACC CTGTAAAGTT AGTGGGGCTT GTCCTTGATT GTGG 5 2^ 

;2; INFCRXAIICN FOR SEQ ID NO: 126: 

:i; SEQUENCE CHARACTERISTICS- 

='A^ LENGTH: 279 base pairs 
:B; TYPE: nucleic acid 
-0: STEAOOEDNESS : double 
L - - TOPOLOGY- linear 



- ^ * — - >-* - - — .~ ^ ^ - O ^ w* ^ 



- OPE : nuc le:: acid 
ST^LANOEDNESS : double 
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TTTAGGAGAA TCTGTACTAT TTCAGCATGT CCTCCTCCAG CAGCAAAATG AAGAGGAGAA 



120 



CTAAGTTGTC CATTTAAAAG GTTTGGATTG CACTTTCCTT TCTCTAACAA TATGCGAGTG 



180 



GCCTCAACTT TTCCATACCA G CAT GC AT AA TGAATGGGTG CCCAGTGGTC ACTATCTAAC 



240 



TGGTTGACTG AAAATCTTTC ACTGAGAAGA CGGCTTAGTA ATTCTGAATC TCCTTCACAG 



300 



GCGCTTCGGT GGAGAGGAAA ATCATCTACC CACTGTCGTT CCTTGTCTTC TGTGACACTG 



360 



CTCATGCTTC TCTGCCAGTT TTTCCTGTTT AGGGTATTTG GATTTTTGAG TAGTCTGGAG 



420 



CTCCTAGACC CAAGTATGGA TTTATTACCC ACTTATCTAC CCGATTTGTA TACTGAGGAT 



480 



CCTATCCAAC AAAGGGTGTA AATCCAGGAT CCGCCTTC 



518 



(2) INFORMATION FOR SEQ ID NO: 138: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 266 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138: 

GATTGCAGGC ATGANCCACT GCGCCCAGTC GAGTGGTAAT ATGTTMAAAG GAAACCTTTT 60 

TCTGAGCAGG TCTCAAAAGA GAGGTTAAAA TACTGAGTAG ACCATMCTGT AAACAGATGT 120 

MCTGTTATYC GGG CTTTCAT ATTCCATTTA TAAAGCACAG GCAGAGCTCA GAGTAGATTT 180 

AAYGTAACTC TGAAGGGCAC TAGGATTTTC AGAATGGTAA ATAAGCATTG GCTTCACCTT 240 

AAATY C AAAT CTGC ATT GGG CTTGTA 2 66 
(2) INFORMATION FOR SEQ ID NO: 13 9: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 341 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139: 

ACCTCGCTCA CCGCTCTGAC CACCGACAGG CAGAG CAAAG GATGCGGGAG TTGCCTCTGC 60 

TGCCCATCTA AGGGGACGTA GGCAGAGAAG CAAAGGCCTC TGCTCTCCCT CCATCCATCC 120 

CGGTGTGCTG GCCCCAACGG AACAGGAGTC CTTCAACTAT TGCCTGCCAG AC-AC CCAATT 180 

TTAGGGACTG TAGTCTGCAT CTGGATGAGC TGGGCTGTAG ATTGAAGTCT CAGAAGCAGG 240 

GAAGGTTGGA AGGGGTAGGG TCCCAGAGCC CATGGAGTTA TTGCTGAGAA GATATGCAGG 300 

GGACACATTT CCCAGGGGCA GAGTAGAAGC CCTGGGCCTT G 341 
(2) INFORMATION FOR SEQ ID NO : 140: 
(i) SEQUENCE CHARACTERISTICS: 
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•.A • LENGTH: 234 base pairs 
(5 i TYPE: nucleic acid 
(C ' STFALNDEDNESS : double 
i D ■ TOPOLOGY : linear 

(xi; SEQUENCE DESCRIPTION: SE; ID NO : 140: 

GTGAAGGGAG TTGCAGAATC AAATTGCTAO AT AGGGC AAA OAAAAAAGAA GGOTTTTTCA 60 

AAAAACATTA AATTOACATG CAGTOTOAGA GACTATTTAG GOAAAGTTCA AGTTAGGAGC 122 

TTTTAGGATG ZGGG AN T AAA A C T T T AA T K G GAGGGGAGGG CTTG CTTCTG G A G AA G G AA G ISO 

AAGOCAGACT TGTTAGACAG TACTCTTAAO TOCTAGCCCA GCCTAGCGTG COOT 234 

(2) INFORMATION FOR SEQ ID NO: 141: 

(I: SEQYEN CE CHARACTERISTICS: 

(A; LENGTH: 254 base -airs 
• 5 ; TYPE: nucleic acid 
:0^ STRANDEDNESS : double 
■2j TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ 10 N 0:141; 

CAA.CTCA.GGT TAGC.AACTGC A G G AAAA C T T TCTTCATTT7 CACTG.AATTT TAAAGAGAGA 61 

^A^.TwTCAG AGAAACTTAG GZG^A^\GIA AAAGAGAGGC .AAA\ZZT CTT 122 

'^^^'^ GATAOTTTTA tttttatcto TT7CTCTACT CATGTGCTTA AOTGGTGAAA 1 E 0 

^A.TCTGTA G AAA I* A G A T C C7ICTGATTC TGCATCTCAT TTCCTTATGG CAACTACAAC 2^2 

ACGAGGAATO CAGCTOGAAA TGCCACIAAC CCCACATCCA GCACCTGAGA GAGGAAGCCA 302 

wc^^GGGC Z 0 A. CT OA. CO C TGGGCCTGCG CACTGGGGTT GTGG 2 5*, 

■ i " SENTENCE CHARACTERISTICS: 

(A; LENGTH: 2~5 base rairs 

TYPE: nucleic acid 
(Z) STEA2N2E2NESS : double 
(2; TIPCLOGY: linear 



- ^ — . . 



^- _ _ _ _ — 
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(2) INFORMATION FOR SEQ ID NO ; 143 : 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 262 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143: 

CCGCACCTCG GCCAGAGGCG GCTGGAGCAG CTGCTMCCTT TTCCCTGCCG CCGCCTCTCC 60 

AGTCCCTTTT TTAATTACCA CTCCAMCTGC TGGGAACGGG CG AG AAA GAG GAGGAGGCGA 120 

GAAACTCCCA CCGACCCACA GAGGGAG CAT GATTTCGGCA ACTTCACCTA TCATTCTGAA 180 

ATGGGACCCC AAAATTTTGG AAATCCGGAC GCTAACAGTG GAAAGGCTGT TGGAGCCACT 240 

TGTTACACAG GTGACTACAC TT 2 62 

(2) INFORMATION FOR SEQ ID NO : 144 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 384 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144: 

GGAAAAGCGG GACCCAAACA GTGGTGCTGG GGAAATTTTT CCCTGTCCCC TTTGGAAGGC 60 

TGAGTGGGTG ATGCAGCACA GGAACAAGGC TTGGACGTCA GAGGTCTCAT CTTCACTGTN 120 

ACAAAGCATA AAGGACTTGG GGTTGAGCGT GTGTNTGGGC TCAAGTGACC ATGCAAGTCC 180 

TGTCACCTCC TTCCTAAGAC CCCATCCTTC TCCCAAGTCC TCCACAAGAG CTACCTTCTT 240 

CAAAACAATA ACAGAAACAC ATCAAGCTTG GGCGTCACTG AATTCAAGTT CTGATTTCTC 300 

CCGTCACCCC AGCAACAGTG CCCAGTTTGA TTGTGACACT TTGACCCAGC ACTTGGTTTT 3 60 

GAATGTTCTT TTCGGCTTGT ACCG 384 
(2) INFORMATION FOR SEQ ID NO: 145: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 324 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145: 

C T A C AT G G AA T C AT AAGTKT T C C T AAAAAA GGAAGACAGA TTTGAAGACA GAGGAGGAAG 60 

GTGATGTGAT G AT G G AAA C A AGGGGAGAAA ACGCAATGTG ATGTGGCCAC G AAC CAAGTA 120 

ATGAGGACAG CCTACAGAAG CTGGTCAAGG C AA G G AAA C A GATTCTCCTC TAAAGTCCCT ISO 

GGAGAGGGCC TGGCCATGCT GACACCTTGA TTTTKTCCCA GCAGAAACTC ATTTTGGATT 240 
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CTCTGGGACC CTTCAAGCAA GTCAGGTGGA AGAAGGTTTC CCCACCCCCC ACCAGGCCTG 180 
TTTGTCCCAG GTTGCCCTAG GATGGAGGCA GTTCAGACCC TGGGTCACTG ATGCTTGATA 240 
GGAAGATCTT TGATATCAAT GGCCTAAGCT CTGCTCAT 278 
(2) INFORMATION FOR SEQ ID NO : 149 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 368 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 149: 
TTTTTTTTTT GTTTTCAACA AACTTTACTA AATAACCCTG GAAAGGCAAT GAACGATCTG 60 
ACAATTTAAG CTCTAATGAT TTAAAGCTCA GCTAGAAGAA AGTGAGGCAT GACATATACT 120 
GTCAACGGAG GGTGAAGGAG GCAGATTTCT GGAAATGCAA TGATCCCACA CATTTGCTTC 180 
AAGGAGAAAC CTGCAGACAT ATTTTCAGGT CTTGCTAAGT AACAACTGTT TATTTGTAAT 240 
CAATACATTT GGGGAAAGTC TGCTATGTAG CTAAGGTCAC TGTGACCACA GACCAACAGA 300 
TGGAAAGGAA AAAGGCACTG GACCAGCAAG GAAAAATACA TCCCCATCCT CAAAAGAATT 
TTAAGGTG 

(2) INFORMATION FOR SEQ ID NO: 150: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 367 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150: 

TTGTGAAATG GGCCTGGGTA GATAAGGAAA AGAACCTCCA AGAGGTTAAG TGATTTG CGG 60 

ATTTGCCTAA AT T ATA C AG A AGAGTCAGCA CCAGTGCCCA GGCCTTCTGA TTCTTAGTGC 120 

AG T AAA C AC T AAGCACCATC ATTCCATTTC ACCACACTCC TGTCTTGCTG TTGTCCTCAG 18 0 

CTAAGAAAGC CTACCCCTGA GTTACCCTCT TCCATCTTAG AGCCTTCCTG CTCGCTGTCT 240 

GCCCCCCTGC GATGGGGACT TCTTTGGCCC TTCTCACCCA GCCCAGCCTC TGCCCGTTTT 300 

CCTTCTCCTT TCCACTGCGG CTGAGCTCTT TTCTCCTTCC GAGAAGCCTT TCCTTCATCT 3 60 
TTCCTGG 

(2) INFORMATION FOR SEQ ID NO: 151: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 366 base pairs 
(5) TYPE: nucleic acid 
(C) STRANDEDNESS : double 



367 
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CD) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SET ID NC:151: 
ZGZAGZGGGG CGCCTCCCTC CTOT0T00TC CATAGGTGGG GGZTGGGGGZ CTTCTTTTTT 60 
TTTTTGTOTT GCAGGGCAGT TAAAGTTOTO CATTTGCCTO TCT OTTO AO A CCCAAATGCC 12 0 

.AAAGGAOAOT TTTOOTTTCT TTTGTGGGTA GTTGCAAAAA AAAAAAATTC CTATGGGTTA ISO 
OTOOOAOTTT T AAAT A C T T T G T AA 0 T T AAA GGCAAAGTAG TAT GT 0 AC TG TTTOTTTTCO 2-0 
OTGT.AGTTTA CTTTTGAGGT TAAA 2ATOTT TCOATGTCTT TATTGGTOAA ATAOAGTTCO 300 

~ i C IOTA 0 AAT G T T A_AT COT A«T AT G G A 0 0 ATT TT T C 0 T AAT G G GAT T A 0 0 G AT TTT 3 6 0 

TTIAAA - ^ £. 

(2) INFORMATION FOR SEQ ID NO: 152: 

(i; SEQUENCE CHARACTERISTICS: 

1 A : LENGTH : 26 9 base pairs 
< E ) TYPE : nucleic acid 
iO) STRANDEDNESS : double 
i.I) TOPOLOGY : linear 

" N 0 r 

^ * ^ -~ ■* ^ ^ * ^ 0 * o u ... ^ * Cr.ov7 ; jo^^ i ^ ^A^^G^TTGG 0 T G G T 0 A 0 0 A G Z G AG G G GG G 
^^^w^^^ iuAATTTAGG GAOCOCAGOA TOTOACAGGT TTCOCCTTCO ATCTTTCCOA 

bi^o^ACiGT GTCTGAGOAO gtgtgocoag gtgaggttgt atocactgtg tctgagcagg 

- ^ - ^ — - -**- - - - v: ^ ^ ^ ^ r. — ■ - — w- _ ^ ^ cvjAu'^AoG. ^ G G 0 ^ G T T G C A G G T G G AAG 

information for sec id no:1;2: 

; A ; L EN" G T H ; 2 6 2 base pairs 
,2 'l2?2L02Y: linear 

~ - ~ ^ ~. o j: — _ _ _~ _ _ ^ * ^ * ^ w- ^ ^j- ^? ^- ^ ^. L - _ ^ _ A_AA_A w- ^- .AA G ^ G ^ ^ A ^ 

- ^ ^ _ _ _ w - „ o _ a ~ A AA „ AN ^ AT 0 C A 0 AT G G 2 0 A. G 2 2 N 1 A. Z AA. 2 T T 2 T T 2 
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(A) LENGTH: 405 base pairs 
(£) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154: 

TGGAACTTGT GAGTGGGGAC CCATGATGTA TGGGTCTCAC CTGACTTGAG GTGAATTTTG 60 

GAGTGAAGGG CCCTGAGGTC AGCTCCCAGG TCGGTCGTGC TGGGCChGGC CTGGTTTTCA 120 

CAGGGGCTGA AGGATCCCAG TCCACCTGTG TGCATGTCAG GGCTCGGCCG GGAAGAAGCC 180 

AGCAAAGTCC CCCGTGTCCC TTGCTGAGTA TTCTGTCACA GACAAGCCTC CATTAAAGCC 240 

ACAGCAGTGC TACCCACCAC ACACACCTTG CTGGCCCGGC CACCACTGCT GGCTTCAGCC 300 

CCTTNAGCAG CCCATGGNTT AGCAGACCCT CAGATGTAGG TCAGTGGCCT TANCTGTNTC 360 

TATCCATGCT GTTAAACTCC CTGCCTCCAA CTGGGGGTCA CCAGT 405 

(2) INFORMATION FOR SEQ ID NO: 155: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 400 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155: 

CCATGATCTT ATTTATTACA TCTAGTTTTT CTTTATACCT CTAAAAAAAA GTGCCTTTTA 60 

GATTTAC AG C TTGTGCTTCT AAAGCAAAGG TTAAAACATC ATGCCCCAAA GGAAAACAAG 120 

GTAAAAAGGA AGCTGCCATA TAAGCTCTTA AAANTT G T AT GTTACAAGGT TCTAAAATCT 180 

CTTCAGCACT GGTTGGTTGG TAGATTGTAC G AC ACT G AC A TGGTGCTTGG GAGGGTCATT 240 

TATCTGATGG TTGGAGCAGC ACCATGGGAA AGCTGCCCAG ATGGTCTACT GAAGTCCTTG 300 

GCTGTGCACA GAATGGGCCC AAGGGCCAGN AATT CAT GAG TCCGGGGAAC TTTGGNGGTC 3 60 

CTTACTCAAT CTCCTTAGTG CTAAAGNTTC AGAGTCTCAA ^00 
(2) INFORMATION FOR SEQ ID NO: 156: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 443 base pairs 

(B) TYPE: nucleic aci'd 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156: 

GTCCTCTGGA TTGCTTCGTT GGTTGCGAAC TTTAAGAATG GCAAACTGTG ATTGGNTCCG 60 

A T T AA G A C AA GCTTTGTAGT TTTCITCGTG T AAA C.-.CC.A-. ATCCCGCCTG GGCCATGAGG 120 

TAGCAGAAGT GGGCCGCATC CAAGAGGCCC CTTCA-.::-.-. C^CZCZCGCC CATGGTAGCC ISO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159: 

CTGTCAGTAA TGGCTCACTA AAGGGCCAGC AGTTTAAATT ACACAGGTTG CACTAAAAGC 60 

TGCAGCTTTG GCCAGGCAAG GTGGATCACG CCTATAATCC CAACACTTTG GGAGGCCGAG 120 

GCGGGCAAAT CACCTGAGGT CAGGAGTTCA AGACCAGCCT GGCCAATATG GTGAAACCTA 180 

AGCCTCTACT AAAATTACAG AAATTAGCCG GTCGTGGTGG CACA 224 

(2) INFORMATION FOR SEQ ID NO: 160: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 377 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 160: 

GGAGGCTGAG GCGGGGGGM CACGAGGTTA GGAGATGGAG ACCATCCTGG CTAACACAGT 60 

GAAACCCTGT CTGTACTAAA G ATA C AG AAA ACTGGCCGGG CGTGGTGGTG GGTGCCTGTA 120 

GTCCCAGCTA CTTGGGAACT CGGGAGGCTG AGGCAGGAGA ATGACCTGAA CCCGGGP.GGC 180 

GGAGCTTGCA GTGAGCAGAG ATTGCGCCAT TGCACTCCAG CCTGGGCGAC AGAGTAAGAC 240 

TGTCTCCAAA AAAAAAAAAA ATAATAATCA AAGCTCTTGG ATTTATAGTT TGGTCCCCAG 300 

CCTTGTTTTG ATCTTTCCTT TATCCTGTTT TATTGCCATT TACCACGTCC TTTTGGAAAC 360 

ATCCCTTTCA ACTGCTG 377 
(2) INFORMATION FOR SEQ ID NO: 161: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 273 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161: 

GCAGCGGCGC CGGGCGAGGA GGCGGCAGGG GCGAGGAGGG GGCGGCGGGT GGCGACCCGC 60 

AGGAGGCCAA GCCCCAGGAG GCCGCTGTCG CGCCAGAGAA GCCGCCCGGC AGCGACGAGA 120 

CCAAGGCCGC CGAGGAGCCC AGCAAGGTGG AGGAGAAAAA GGCCGAGGAG GCCGlGGCCh 180 

GCTCCGCGCT GCTKGGCCCC CTTCGCGCGG GCCCGGCGCG CCCCCGGAGC AAGGAGGCAG 240 

CCCCCGOGGA GGAGCCCGCG GNCGCCGCAG ACT 273 
(2) INFORMATION FOR SEQ ID NO: 162: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2S6 base pairs 

(B ) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi; SEQUENCE DES0RIP7ICN: SEQ ID NO: 162: 
7777GGTGAA A7AAA7 CAGA GTACTACAAT CATCAAACAT CTGA7TCA77 7AACA7G7GA 6 0 

G0A7O7A7AO G7G000A777 G7G7GAA7A7 7GAG7A7A7A 7C7CA7A0G7 A77G7GA7G0 120 
C770A777A7 7G7GG77A7G GC7G7AGA7A TGGAAAAAAC AG7AGC7GAG ACA77777A7 ISO 
7A7GAAG7A7 A77AIAGC77 AATCAATCAG 7CAGAAAA7G C77AGGAAGA AGAAA7GCA7 2^0 
GA77G7AAA7 GCA7GA777C AACA7GC7AC CCGGCCAACA AAG77G 28 6 

[2) INE0R>1A7ICN FOR SEQ ID NO: 163: 

(i; SEQYENOE GHARAC7ERIS7ICS ; 

v A ; LENG7H : 3L2 base uairs 
(3:- TYPE: nucleic acid 
id S7RANDEDNESS : double 
(D; TOPOLOGY linear 

■'xi; SE07EN0E DESCRIPTION: SEQ ID NO: 162: 

70 0 0 OAAGGA AGACA0AA0A 7GGAGAA0CG TCAAGGCAGG AACCOCACAG AC7G7CCC77 60 

OOAGOIOACA C707G0CAO0 7C07GG0G0T GTCCCAATTC 7GAG CCAAGG CC7C0CCGAG 120 

o^A^AA^TTO C0TGG7C070 TGTCCCCACA G7GAGC7GAC 7GGGGGTGAG GGAGAAGGAO ISO 

^- ^ v_- ^ ^ ^ . . . o _ ^ .-. w . . OTG ICOO 7GAGAA 0 TT 0 GTGGTGA OTG 0 0777 GO G AG 2-2 



• w- * ^ v_- ^ ^AOAGG OA GGGG.AGC7G AG^^OGTGGG AGA000 0777 77770 OC OCA 

-w^^«.-.^- A^-^- W AA 0^ 0 0A7CAG7AG0 AG7G7GG7G7 77 

INR0P00ATI2N FOR SEC; ID N 0:16-: 

(i; SENTENCE CKARAC7ERI S 71 CS : 

CA) LENGTH: 292 base rairs 
\E* TYPE: -:leic acid 

;o; srrandedness : doub> 

IOPOLOGY: linear 
;xi; S ED; YEN IE DES 2R I PRION : SEQ ID NO: 16-: 

_ AAAA _ A A 7 07 A 0 G G A 0 0 77 AA* 0 0 G A 00 2 2 A 7 0 07 2 A 0 

~ " * " * ~ w ■ " ^ w ^ - — - - ~ — ^ - ^- - ^ - - w ^ -v_7 w ^ A ^ t v_- * o A G 7 7 2 7 A 

~ . . . ~. ^ w . AA. . . _ A 2 2 2 A 2 2 A 7 A 2 A 2 AA. 2 2 7 2 A 2 7 A7 A 7 2 7 A 7 7 2 A 2 2 2 0 2 7 I 

- - - - - ~ - - „ A ^ AA V.-2G07GAG A 0 0 7 0 7 G 0 A 2 A0I0722272 

- - ~- _ A -7 A 



^00 
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( A) LENGTH: 406 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165: 

GTTATAATTA TCTTGTTTTA TTATTTATTG TTTATCTCTT ACTGTGTATA ATGTAGAAAT 

TAAACTTTAC CATAGGTATA TACATATTGG AAAAAGCATC TT AT AT AC AG GGTTTGTTAC 

TATCTGTGGT TTCAGGCATC CACTGGGGGT CTTGGAACAT ATCCCTTGCA GATAAGAGGG 

AACTGCTGTA TCCATAGAAT AAAAACACCC CATCTTGAAG ATAGGAGGTT CTGTAAATTG 

GGATGGGGTC AGGGAATCTG AATTTTAAAA GTTTCCCATG TGATTTGATG CCCAGCCAAG 

GGCTGGGGAC CACTGTCTTG AAATATAATG CTGAGGAAGA TACTGTCTTT GGATTTTCCT 
GGTAATTCCG A G T G C AAATT CTCAGGCTGG AACCTTATGG GCCTTG 

(2) INFORMATION FOR SEQ ID NO: 166: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 453 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166: 
GAAAACTTTG CCATGGGTCA GTTTTATTGG AAGTTCATTT TCCTGAATGT TTGGAAGAAA 
GTCTAGTGAC TCAGGATAGC ATTTCTAATT TCACAGAGTT ATTTTTCCGT TATGAAACAC 
AGATTGCCTT TGAGGTCTCC TGTTTCTACT ACTGCCCCTC ACTTTTATGT GGGCCTGCTC 
TTTCCTTTGT TTCTGGAGAA CCTTTTCCTG TTCAATTCTG TTTTAATTTT CAGCAGTTTT 
TTTTCTGTGT GAGTGAGGCT GTTTCCTAGC AGGGAGGTCT GGTTGGTCAT TTTCAAGTTC 
ATCAGGGCTT CATCAGGGCT TGTCCACTTC AACCCTTACG CTATAGGNCC CTNTGCACCA 
TCTGCANTCT TCAAAATGTG CCCACTGGTT CGTTCCCATG GANGGCTTGT TGGTAATTTG 
GGCTTTTAGG GGGGGCCATG GAAGGAGCAA ATC 
(2) INFO RtfA T 1 0 N FOR SEQ ID NO: 16 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 285 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 167: 
TTTACTCTTA AAACTGTTAC AA C A G AAT C A TGGACTGACA CAGGTAATGG CTGAGCCATA 
A G C AAAT C G A GAAGTACAGA AATGTCCCAC CCCAAACAGC TGCGGAGTAC ACATCACACA 



60 
120 
180 
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300 
360 
406 
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TTTGTGAGCC AGGCCCTGTA GGAGGGATTG TGGATGGCAA AACCTCAGGT TCTGCCCAAA 120 

TCCTCCCCTT GGGGGCTGGP, GGGTCTCTAG TTAATTGGCA TTCCGGTGCT TAAGGCCACT 180 

XT T G G G TAG A GGTTTGGCAA GGATGGAGTG TCCAGACCTA TGATCCTCTA AGAACTTTAC 240 

CTTTTAAAAA CAGCCACCCA AATGGTGGTG GCGTGGGGAG CAGGTGGTGG TGAAGGGACT 300 
GGGGGTGTCT GGCCATKGCC ACGTACCAGA GGAGACTCTG TGAGCCCTCT CCCTGCCTGA 360 
GGGAC ACTTA ACTTTTATAG CACTACATAG GGTCAACG 398 
(2) INFORMATION FOR SEQ ID NO: 171: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 321 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171: 

AG AC AG CATC TGGCTCTGTC ACCCAGGCTG GAGTGCAGTG GCGCAATCTC GGTTCACTGC 60 

AACCTCTGCC TTCCAGGTTC AAGTGATTCT CCTGCCTCAG CCTCCCAAAT AGCTGGGATT 120 

ACAGGCATGT GCCACCATAC CCAGCTAATT TTTGTATTTT CAGCAGAGAC GGGGTTTCAC 180 

CATGTTGGCC AGACTGGTCT CGAACTTCTG ACCTCAAATG ATCTGCCCAT CTAGGCCTCC 240 

AAAAGTGCTG GGATTATAGG TGTGAGCCAC 1GCGCCJGGC CCTTGGGTAA ACACTTCAAA 300 

321 

TGCAMCCAAC CATTAAAGGT A 

(2) INFORMATION FOR SEQ ID NO: 17 2: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 293 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172: 

GAAACTTATA GTCTTGCCTC CCAACCTTCT GAACACTCCA GTAGAAAAAT CTTCTCGCCT 6C 

ACCTTTATCA CCCCACGACC TACTAGCATT TCTTACTCTC AAAAAAAATC TTTTCTGAAA 120 

AATCAAGACA GAGTGCAAAC AATCAGCATA ATTTTATTAT GACARAACTT TTAAATTTTA 180 

TCCCCCTCTC TGAGAGKTCT GCTAGGACTC CTTCAGATAA GTGAAAAAGA AAKTTTTTAA 240 

AATTTATTCT CAAATCCGAA TTCCAATCTG TATAAAAAGG GCGATTCTCC CTC 29 3 
(2) INFORMATION FOR SEQ ID NO: 173: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 282 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(A) LENGTH: 381 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 177: 

ATTGGGACGG GCCCCCCTCT GAGGCGACGG ATCGATAAGC TTGAT AT CG A ATTCCTTGAT 60 

NTTTTCTAGT GTTATGGTTT TCTCCCACTC CAATAACTWT TCATACCTKT GGTCTKAGTT 120 

TTTCCATCTA TAAAATCATG TGCTAAATAA TTAACTATCA TCTCTATCAT TGTCAGACTA 180 

CACAAAGCTT CCAGCCTGGG CAACAGGAAC CCTGTCTCTA AAAAAAATAC AAACATTAGC 240 

CAGGTGTGGT GGTATGCGCC TGTATTCCCA GCTACTTGGG AGGCTGAGGT GGTAGGACTA 300 

CTTGGGCTTT AGAGGTCAAG GCTGCAAGTG AGCTGTGATT GCGCCkCTGC ACTCCAGCCT 360 

GGGCAACAGG GCAAGACCCT G 381 
(2) INFORMATION FOR SEQ ID NO: 17 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 443 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xl) SEQUENCE DESCRIPTION: SEQ ID N0:178: 

GATTTTATTC AAACACAGGC AAGAACAATG ACCTTCAGAG CTGGGTAAAA ATAATAAGTT 60 

AAAAG CAT G G TTAGAATTTT AG A C AAT C AG ATAAAAAGTT TGAAGGAAGT GATTTCCCCT 120 

TCCTCTCCTA ATTGATTAAT TCAACACAGC ATAAAAATAA TTTGTATCTA TAAAATATCC ISO 

TTGTTCCCAC ACAAATGAAC TGGAGGTGGC CCTAGGATTT CCTTGACTAT GCACAATGCA 240 

CACAATCTAC ATGTCCCTCC TCCCCAACTT TTAAG G C AAA AATGGTCCTG CATCTTCAGG 300 

CAGAGGGTGG GCTCATGCCA GCAGTCAGCT GTGGTCAAGG ACACTGGGGG TGCGTTTYCT 360 

CCACCGAAAG ATGCCTGCTT TGGGTCCACT TTGGGCGCGG GATCCCATTT TATTTTCTAG 420 

CCTGTGCCTC ACCACAGGGA AAA 443 

(2) INFORMATION FOR SEQ ID NO: 179: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 325 base pairs 
(5) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 179: 
TGGGGGACCA GCATTGCTCC CAGCTGAGGG CGCCGTCTTC CTCACCACGT ACCGGGTCAT 60 
CTTCACGGGG ATGCCCACGG ACCCCCTGGT TGGGGAGCAG GTGGTGGTCC GCTCCTTCCC 120 
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ACTAGGCTGG TCTTGAACTC GTGAGCTCAA GTGATCTGCC TGCCTCGGCC TCCCAAAGTG 



240 



CTGGGATTAC AAGCGTGAGT CATGGTGCCT GGCCTAGTTT GCTCTTATTT TTTTTCCATC 



300 



TTTGCAGTTT CTAGGCCACT GGGAACAGGC TGCAGAGCTC AGAGTCCACA GCTGTGAGGC 



360 



TCCATGTTGC ACCATCAAAA AATAAGGTGA CGAGAGTCCT GGGTTTCCCA GTGTCACGGC 



420 



AAGAGGGGTT ACTGCTCACG GGTACACACA G 



451 



(2) INFORMATION FOR SEQ ID NO: 183: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 444 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183: 

CCAAGTTGAC CCGCCGAACC ACCGACAGGA AGAGTGAGTT CCTGAAAACT CTGAAGGATG 60 

ACCGGAATGG AGACTTCTCA GAGAATAGAG ACTGTGACAA GCTGGAAGAT TTGGAGGACA 120 

ACAGCACACC TGAACCAAAG G AAAAT GGGG AGGAAGGCTG T CAT C AAAAT GGTCTTGCCC 180 

TCCCTGTAGT GGAAGAAGGG GAGGTTCTCT CACACTCTCT AGAAGCAGAG CACAGGTTAT 240 

TGAAAGCTAT GGGTTGGCAG GAATATCCTG AAAAT GAT G A GAATTGCCTT CCCCTCACAG 300 

AGGATGAGCT CAAAGAGTTC CACATGAAGA CAGAGCAGCT GAGAAGAAAT GGCTTTGGGA 360 

AGAATGGCTT CTTGCAGAGC CGCAGTTCCA GTCTGTTCTC CCCTTGGAGA GCACTTGCAA 420 

GCAGAGTTTG AGGCTCAGCA CCGA 444 
(2) INFORMATION FOR SEQ ID NO: 184: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 399 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184: 

GGCAGAAAGA GGAAGGAGAC AGTGCCAGGA GGAAGAAGGA AGGAGTCCCT TAGCTCTCTT 60 

CATTGTCCCC TTTACTTCCT GCTATCTTCT TCTCCTCTTC TTCTCTCTCT TGCCTNTATG 120 

CCTGTATTTC TGGCAATATG ACAGGCCTGC CTACCCAAGA TCAGAACTCC AAAACCACTC ISO 

CCACCCCTGA AGGTCGGGAG GGTCTTAGCA GCCCTGGGTG GCTGCCTGTG CTCAGGTCCT 240 

CACCTCCATG GGAAATAAAA ATGGCACCCT GAATCTCTAG GATTTTGTCA CTTTGGAGTC 300 

ACAGCAAAGT TCTCTTCCTC TTGTCCCCCC GT77G CTGC7 CCTTGGGTTA TAGGACATGG 3 60 

TAAATATTTA TTACTTTCAG GGAACCAGTA TTTTATTAC 399 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 284 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 188: 

CCAGCAACTC AAATTCACCA CCTCGGACTC CTGCGACCGC ATCAAAGACG AATTTCAGCT 60 

ACTGCAAGCT CAGTACCACA GCCTCAAGCT CGAVTGTNAC AAGTTGGCCA GTGAGAAGTC 120 

AGAGATGCAG CKTCACTATK TGATGTACTA CGAGAKGTCC TACGGCTTGA C CATC GAG AT 180 

GCACAAACAG GCTGAGACCG TCAAAAGGCT GACGGGATTT GTGCCCAGGT CCTGCCCTAC 240 

CTTTCCCAAG GAGCACCAGC AGCAGGTTTT TGGGGGCCAT TGAG 284 
(2) INFORMATION FOR SEQ ID NO: 189: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 215 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 189: 

GGAAGGATGA GAAACAGATT TCTGCTCACT TCATGGGCTG RCCTRGRATT GACGATGGTR 60 

CAAACCCAAG ATTATCCTCA TGTAATTTAT GAAGATTATG GAACTGCAGC GCATGACATC 120 

GGGGACACCA CGAACAGAAG TAATGCAATC CCTTCCACAG ACGTCACTGA TACAACCGGT 180 

CGGGCACATC TCKCGGCCTA TGCTGCCGGT GGTGC 21:> 
(2) INFORMATION FOR SEQ ID NO: 19 0: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 153 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190: 

TTTCATATGG AAAGAGCTAG TACAATCACA TATTTGAAAG GAG AAA C AAT AGGTACTGAA 60 

CCGGAGGGAA AGGGCGAGGG TGAGTGTGCC AGCkCCGGCC TGGTGAATCC ACGATTCGGT 120 

TTCCCATCCA AGGGTAAGTT TC CC AAAATA CCG 153 

(2) INFORMATION FOR SEQ ID NO: 191: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 316 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 191: 
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_ A^-A.^AA ^ T C TGT G GAA A ■„ A . A . C A ^ C iC.T C A CAT G C . G CAGAGTT 
^ AAAGAGCAG GTAGTAGACA GACCAAUACC AGTTTCGCGT TAAGGCTTTT 

.TICN PGR SEQ ID NO; 192: 

IQUENCE CHARACTERISTICS: 

A) LENGTH : 3" base pairs 
is: 7YFE : -cleic aaic 

C x STP.ANDEDNESS : double 

i; TOPOLOGY: linear 



— ^A-.r„AAwvrAA wGGGAGAG7G GGGA7GGAGA GGG7GAGGGA GACCAGAGNA 
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G7A777A7AG A777A777A7 A7A7G7A7A7 77AG77GAGA NGAAACGAAC A777C GGGGA 6 0 

CAGGAA^CAA GGkGGGGGGG GGC7GG77GG C7GAG7G GG1 ACG7CAGAG7 CAGAG77GGG 120 

A G A 7 G A C AAA TACCAAGCTC AGGG7GAAGA AG7GGGAG77 AA C7GGG AA G 7AGGGKGGGG 1 = 0 

7G7A7GGACA CGGAGGGT7G 7AAGGG7GCA CGG7A7GGGC AGKKGGT7TG CAC7GGGAGG 2C0 

CGG7A7G7AC AGG7TGAAAG C7AGGGGTGA GAT7AGGGCA GTGAG7AGAG GAACA7AGG7 300 

u r*A-A G i T ^ A G AGAAGA t 1 a 
(2) INFORMATION FOR SEQ ID NO: 192: 

(i; SEQUENCE CHARACTERISTICS ; 

(A- LENGTH: 360 base pairs 
: 5 1 TYPE : nucleic acid 
:C) STRANDEDNESS : double 
^OrGLOGY: linear 

. Cxi} SEQUENCE DESCRIPTION : SEQ ID NO:I92: 

: TTATATGCA GG7777GAC7 AGCA7G7A77 G7G7G77TTT CTGOTCTATG 6 0 



AA.AATTTTA TATTTGATG C TACTTOTTGA AAGTTTACTC TTTGATGGTC T AA G A G AA C A 121 
-■-■-AGATGGT TTATATGAAT A^ANCTTTATC TGGAGGATGG TGGATTGGTA AATNAGGAGA 15i 
G AGATATCAAG ACTTTATGTGT G G G AA. 0 T AAA. A.7A7A.7A„A7G CCA-AATGTGT 2^0 
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(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 225 base pairs 
(£) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194: 

GATTATTGGC TTTGCTTTCA TAACATGTAT TTTTAAGTAT TTACTCTCTT AATGGCCCTC 60 

GTGTCTATTT TAT AC AT CAT ATCTCTTAAT TCTCTAGATG GAACACTGAA GGACAGGAAT 120 

TAAGTAAGTG ACTGGCCATG CAAGGGTTGG AAATTTTACT GTATCCCTTC CTCRGTAGAA 180 

GTTATGTTAA ACATTCAAGC AACCACATAT CTAACAGAGG AGTTT 225 
(2) INFORMATION FOR SEQ ID NO: 195: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 195: 

ATTACTAGAT ATTTGTATGT TAAATTATGT GGGTTTTCAA ATTTGTGGAG AATAAGTAAT 60 

AGTGACATTA GTTTAAGGAC AGTGTTTCAT CAGGGCATTA TTTTAATGAA TCTTATATTT 120 

AAATGTCTGT TTCAGGAATT CATGTGAATC TTTCTTTTTA TAGAGGACCC ACAGGCATGA 180 

NTTATTTACT CCTCCGGTGA TAGGTTCTCA CCCTGATGAA AGCGGAAGCA AATTCCAGGT 240 

TAGAACATTA TNCTAGTTAT GTAGGGGGGT ATAAAGTGTG TAAGTTTAAT ATTT 294 

(2) INFORMATION FOR SEQ ID NO: 196: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 233 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 196: 

TTATTTTTCT C TAAATTTT A AAATAGAAGA CTTTAATGGA AAA C AT TT A G T AC CATC AT G 60 

TCAMCCTGAA TGCCAGCAAT ACCTCGACTT TTACACACGC AGGAAGCCTA GTAAAAGCCC 120 

CGTCAGTAGT ACACATTTCT CTATGGTCCT TCAAC AGTTT TT CAT AT AC A AAATTTTCTG 180 

CTATTTTTGC TTTTGCAAAC AG CAATAACT TTTGGGTTTC CCATATGACC ACC 

(2) INFORMATION FOR SEQ ID NO: 197: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 230 base pairs 
(3) TYPE : nucleic acid 
(C) STRANDEDNESS : double 



23: 
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180 
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TGTGCATCGG TCTCTTGGGA TGAAAACTGA TGTGTGTGAT AG G AG T AT CC CTTTGGAGCC 240 
AAAGGTGGTG AAAGCCCTGC TTCTGGACAG TCCGGCTCCA ATCTGTATAC TGTTTGTCTG 300 
GGATGCTGTA CTCAAATACC TGCTGGTCCG AATGAGCGAT GACAAGGTTG TTTGGTATTG 3 60 

GGGGCAATAG CCATAGCAGT CACTTGGGAA ATTGTAAGCA GGCACCGTGC AGTGAAGTTT 420 

422 

TA 

(2) INFORMATION FOR SEQ ID NO: 201: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 273 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201: 

ACTCCACGCT GATGAACCCG ACGTCCATTT CTCCAAGAAA TTCCTGAACG TCTTCATGAG 

TGGCCGCTCC CGCTCCTCCA GTGCTGAGTC CTTCGGGCTG TTCTCCTGCA TCATCAACGG 

GGAGGAGCAG GAGCAGACCC ACCGGGCCAT ATTCAGGTTT GTGCCTCGAC ACGAAGACGA 

ACTTTGAGCT GGAAGTGGAT GACCCTCTGC TAGTGGAGTC CAGGCCCCCA GACTACTTGT 240 

TACGAGGGCT ACAACATGTG CACTGGGTGC CCG 273 

(2) INFORMATION FOR SEQ ID NO: 202: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 436 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 202: 
GGACTCCAAC CCCCCAGGAG GCCGAATGCT GAGCTTGGCA ATGGTGGCCT GGATGGAGCT 60 
GATGGGCACA TCCCCACCGA GGACCAGGTC CTGGGAGTCC TGAGGAAGGT GGTTCTTCTG 120 
GCTGATGCTT GCACTGGCCA AGGGTTTGCA TGGAGGAGGC ACACCATGGC GCTGCAGGAC ISO 
CTGCTCCACG TGTCTCACCA CTGCCTCATA GC AG AACCTG AGGTGCAGCT TCTCCTGCAG 
CATGTGCTTT CTCTGCTGCC GChTGCGCCG CACCAGCTGA GGCAGCTCAG GGATTCCKTT 
CCCAGCCTCC ACCTCCTGCA CAGCTGCATA GAGCAGTGCA AAGGCTCCCG TGCGGCCC^C 
ACCAGAGCTG CAGTGCACAA TGATGGGCGT TTGCAGGGGC CGTGATGCAA GGTAATTTGC 
GTGCACCTCC TGGGTT 

(2) INFORMATION FOR SEQ ID NO: 203: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 336 base pairs 
(E) TYPE: nucleic acid 



240 
300 
360 
420 
436 
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; C) STRANTEDNESS : double 



lOPOLOGY: 



:>:i) SEQUENCE DESCRIPTION : SEQ ID NO: 203: 



;tag 



TACGCCAAGG 



CGTTC 



j:\J3 v-r 



aa gicaggttcc tgtctcac 



to? 



- J- -L -L. i 



ccagcccatg tktcccctct atgt 

TCTGGTGTCC TTTCTGTAAT CAGAGCTGCC GTGA 



ttg: 



:nce characteristics: 

LENGTH : 393 base p a i : 
TYPE; nucleic acid 
STFJLNDEDNESS : double 



i - 1 U EN C E D E S C Pvl P T 1 0 N • S E r: ^ N r 



Lb; L L AA t 



3AGGCA 
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GTGTTCACTC TGGGTGACTG TNACGTCATC CAGGCCCTGG TTCTCAGTGT CCCACTCATG 360 
GACGTNGGGG AGACGGCCAT GGTCACTTCT 390 
(2) INFORMATION FOR SEQ ID NO: 206: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 172 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 206: 
CTTTACTGTG GGTGTGGGTG TCACTGTCAC TGCCACAGCC ACTNGGAGGG ACACACAGCT 60 
TTAACCCCTR TTTGCTTAGG NGAAGGGTGG GGGCATTCAG GGTTATAAAA CTAACTATAT 120 
ACACAGAAGG TCCTAGGKAG AAAGCCACCC TGAGCACACA TGTCTAGGCA CA 17 2 

(2) INFORMATION FOR SEQ ID NO: 207: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 215 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207: 

AAGGCAATTA GAAGATTTAT TGAATATTGG TTAAAAGTAG ATTGACAATG ACATTAAAGA 60 

AT AAA G T G T A ATTTATTTGG TGCTACTTTG TGAATGCTTC CAAGTACAAA TCATCTCACA 120 

ATACCATATA CAACATACTT TCAATCACAA CTCAAATATA AAATAAC CTA CAAAATCACA 180 

TTGCTATAAT CAATATACAA TAATTGTATT TTTAA 215 
(2) INFORMATION FOR SEQ ID NO: 208: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 444 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 208: 

GGAGTTCTCT TGTCCACGGA GAGCAGTGTT GCAGTGTATG GAATGCTAAA TCTTACCCCA 6 0 

AA GGGC AA G C AGGCTCCAGG TGGCCATGAG CTGAGTTGTG ACTTCTGGGA ACTAATTGGG 120 

TTGGCCCCTG CTGGAGGAGC TGACAACCTG ATCAATGAGG AGTCTGACGT TGATGTCCAG 180 

C T C AA C AA C A GACACATGAT GATCCGAGGA GAAAACATGT CCAAAATCCT AAAAGCACGA 240 

TCCATGGTCA CCAGGTGCTT I A. G AG AT C AC TTCTTTGATA GGGGGTACTA TGAAGTTACT 300 

CCTCCAACAT. TAGTGCAAAC A C AA G T A G AA GGTGGGTGCC ACACTCTTCA AGCTTTGACT 360 

ATTTTGGGGG AAGAGGC ATT TTGACTCAAT CCTCTCAGTT GTACTTGAGA CCTTCCTCCC 420 
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o ^ ^ _ o G A u A ^ G T T TTT T G T A.T T 

;2) INFORMATION FOR SEQ ID NO: 209: 

(i) SEQUENCE CHARACTERISTICS : 

<A; LENGTH: 33S base pairs 
(3' TYPE: nucleic acid 
(C> STRANDEDNESS : double 
(D, TOPOLOGY: linear 

(xi; SEQUENCE DESCRIPTION: SEQ ID NO:209: 

GCAGATCACT TGAGGTCAGG AGTTCGAGAT CAGCCTATAT ATG CAAGTA C ACACACAGGC 

ACTCGCACGC ATG2ATGCTC ATGCAACACA CATGTACACT CTACATGTAC AGCTCACATA 

TGCATCCATA CAC AT GTGC A TGCTCACCCA TACACCAGCC ACACACAAGT ACTCATACGC 

ATACATGGCC ACACACAAAG T AC AC A. CAC G TACACCATAT GCATATGTAT GCACTCATAC 

ACTCATACAT ATGTGCCCCC TCAGAGAAGT ACACAAGTGC ACGCGCATCA CACATGCATA 

l ^ ^ G ^ _ C A . G C A. T A. C A C A C G G G A. C AA T T C A. T A. C A 2 A C G 

(2; INFORMATION FOR SEQ ID NO:21C: 

(i; SEQUENCE CHARACTERISTICS: 

(A.:- LENGTH: 371 base ^irs 
(E;: TYPE; nucleic acid 
;;C; STRANDEDNESS: double 
( 2 ; TOPOLOGY: linear 

vxi; SEQUENCE DESCRIPTION: SEQ ID NO:21C: 

GAGGAAGTAG AGCCTNAGGA GGCTGAAGAA GGCATCTCTG AGCAACCCTG CCOAGCTTGA 

~ w . o ^ vj ^ . „- ^ ~- - - ^ ^ ^ ^ o A ^2 A A G C G T AAAA G T 2 A G C A T G 2 T GO AA G G G G A. C 

^ ^ ^ ~ ~ -.^ ^- ^ ^- ^ i ^ ^ AA v.* GOT .A C A C A 2 2 A-AAA 0 AA T A. T G T 2 A. A C TT C C C T T T 

- v.- _ . ~. ^ _ ^. ^ ^ AL 2 A - G G I A GTAA G C CC C A. T GTTAA A CT A. A 0 2 G G 0 A G T 2 

o _ ^ - - - - - A ^ . w ^ AAA AA C A AAA 2 2 T G 2 T T T T G G G .A T 2 T 2 T T T G G G 
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TTTTGCATAG TTGTCAGCAG ATAAATATTG AATGACAAAA CTCAGATGGA GGAAAAAGAA 120 

CAAAATAACC TAGTTCTCAG AAAGATTTAA TGAGCAAATG GGAAAATGTC AAAAAGATTT 180 

ACAGACAGGG GCATCTTAGA GTCACTGGAA TCACACAGGC CTTCCCTCAG CTTGAGGGGC 240 

TGCCTGGAGG TGGGGGTGGG GGTACACCTC CTCAGTGGGG AGAGACTTGC CAAAT 295 



(2) INFORMATION FOR SEQ ID NO : 212 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 370 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212: 



TGGCCGATAT GAGGGGGGTG GGACTGGGGC CCGCGCTGCC CCCGCCGCCT CCCTATGTCA 60 

TTCTCGAGGA GGGGGGGATC CGCGCATACT TCACGCTCGG TGCTGAGTGT CCCGGCTGGG 120 

ATTCTACCAT CGAGTCGGGG TATGGGGAGG CGCCCCCGCC ACGGAGAGCC TGGAAGCACT 180 

CCCCACTCCT GAGGCCTCGG GGGGGAGCCT GGAAATCGAT TTTCAGGTTG TACAGTCGAG 240 

CAGTTTTGGT GGAAGAGGGG GGCC CTAGAA ACCCTGTAGC GCAATGGGGT TGGGCGGGCC 300 

AAAGGTTAAG TTTGAACCCG AAGAGCAAAG GAAGAGGCGA TCATCATAAG TGGAGGATTA 360 

GGATTAGGAT 370 



(2) INFORMATION FOR SEQ ID NO: 213: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 302 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 213: 



ATCTGTGGAA TAATCTGCGG GCTAACACGG ATAACTCAGT ATAAGAACCA CCCAGTTGAT 60 

GTCTATTGTG GCTTTTTAAT AGGAGGAGGA ATTGCACTGT ACTTGGGCTT GTATGCTGTG 120 

GGGAATTT CC TGCCCANTGA TGAGAGTATG TTTCAGCACA GAGACGCCCT CAGGTCTCTT ISO 

GACAGACCTC AATCAAGATC CCAACCGACT TTTTATCTGC TAAAAATG GG TAGCAGCAGT 240 

GTATG GG AAT TTTCTCATAC AGAAGGGCAT CCCTCAAACC G G AAA C C A C A GAGATGCTAG 300 

GT 302 



(2) INFORMATION FOR SEQ ID NO: 214: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 354 base oairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEC; IE NO : 2 1- : 

ATGGATGACT GGGZAZZGCG CACAGGGCTG CAGGGTGGAA AAZGZZZGAZ GGZZAGG7GZ 60 

TGACTTGGGG G CAGAGAGCG CAGTGTNGTA GGGGAGGAGA GGZGGZGZZZ CTGCTGCCTG 120 

GZAGZCAGZZ TGCCTGTNCT GZGGGZAGAG CAAGGCACTT TCTGCTGCCG GTGCTTCCAG ISO 

GGCCTAAGCA GCCGCTGCAC ACTCACCAGC GCAAGGCTCC TCTGCAGGGA ACGAGGGCTG 24C 

CTACCCATTT C AC AG AT GAG GGCAAGCAAG GACTTGCCCA GGGTTGCCCA NAGCAAGTGC 2CC 

GIAACAGGCC CTGAGAAGAG NGCCAGTGAG CTCATCCTGA GTTAATTATG GGCT 2 5_ 

(2; INFORMATION FOR SEQ ID NO: 215: 

(i; SEQUENCE CHARACTERISTICS: 

(A/ LENGTH: 26C base rairs 
t3) TYPE: nucleic acid 
iC. STRANDEDNESS : double 
iD; TOPOLOGY: linear 



(xi; SEQUENCE DESCRIPTION: SEQ ID NO 
T2GTTCAAAG TCTAGGCCCT CTTNAGAGCT GGCTG ATT OA GCTTGCCAAC AG TG A CATC A 
GGGTGAGGCT TCCCCTGTCC A C AG C ATT AG CTGCGAATAT CCTCATGGTC ACAAGATGGC 
.GCCAGTGGC CGTCAGGGTG TGTGCTTCCT TGTTCACATC CAGTGGAAGA GTGACAGCCT 
^__CCTTA GCTCTCTGAC ACCANTGTGA AGGTGCCANG AACTTACTAG CAGGNCTTTC 
-^-.^Ao^C ATT C AA 0 AG G 
C2 V INFORMATION FOR SEQ ID NO: 216: 



:e CHARACTERISTICS : 



' TV-" 



base pairs 



(B) TYPE : nucleic acid 
,C) STRANDEDNESS: double 
:C; TOPOLOGY: linear 

■ x i ) S ECU EN CE ^~c;T' T T , - r "^^\-. c r ~ ~ ^. ^ - ^ . 

- - ^ A * ^ . v^-^ATA ATTCT CTGGA TEA. 0 CTGG OA GAGA CTTTTK 




2 60 
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CTGCAACCAT CCATACCTTT TNCCCGTGGC TGCTATGGAG TCCCCCAAAC TCCCCAGTGG 60 

GGCTTATGAG GGTGGGGCAC TTATTANGTN GTCTGGGAAG CTCATGCTGC TCCAGAAGAT 120 

GCTGCGAAGC TGAAAGGAGC AAGGACACCG AGTGCTCAAT NTTCTCGCAG ATGACCAANA 180 

TGTTAGCCTT GCTTGAGGGC TTTCTTAGNC TATGAGGCT 219 
(2) INFORMATION FOR SEQ ID NO: 219: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 390 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 219: 

GATAGGTAGC AGAGACCAAG GCGCAGGGTG CTTCAGATGA GCAAGAGAAC CCAGTCGAAC 60 

CAGATACCCC AGGTGGGCCG GAGGGACCCC AGACCTTCAG AGGGCJGCCC TGGTGTTCTC 120 

CACAGTGCAG TCCCTCTGTA TTCCCAGAGT GGGATCGGGG CTTTCAGCCC ACCCTGATGC 180 

CTGCCCTCCA GGATGGCTGG TTTAGTCTGG GTCCATGTCC CAGACCCCTC TATTCTGCTC 240 

CAGGACAGCA GGACTTCAGG TCTTTCCTGG GGGTGGATAT AGGAGAAAAT TTCTGCCTGG 300 

CACACACCTG GGCTCCAACC ACTTGCCAAG TGATTCACTC TTAGGCCCAG GGGGAACACA 3 60 

ATGACTATCA TTACTGATGC AGACCTGGCT 390 
(2) INFORMATION FOR SEQ ID NO: 220: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: 

TTTTTGTTTT GTTTTAATAT TTTTGATATT CTCTTTGCAT TGAAATGGTA TAAATGAATC 60 

CATTTAAAAA GTGGTTAAGG ATTTGTTTAG CTGGTGTGAT AATAATTTTT AAAGTTGCAC 120 

ATTGCCCAAG GCTTTTTTTG TGTGTTTTTA TTGTTGTTTG TACATTTGAA AAATATTCTT 180 

TGAATAACCT TGCAGTACTA TATTTCAATT T C T T TAT AAA TTTAAGTGCA TTTTAACTCA 240 

TAATTGTACA CTATAATATA AGCCTAAGTT TTTATTCATA AGTTTTATTG ANGTTCTGAT 300 

CGGTCCCCTT CAGAAATCTT TTTATATTAT CCTTCAAGTT ACTTTCTTAT TTATATTGTA 360 

TGTGCATTTT ATCCATTAAT GT 362 

(2) INFORMATION FOR SEQ ID NO: 221: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 314 base pairs 
nucleic acid 
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S7RANDECNESS : double 
(2) 7CP0LOGY: linear 

(xi- SEQUENCE EES CEI F710N : SEC IE NO: 221: 

GAC777G077 7 A 7 7 7 AAAAA ACAAGCCAAA AAAAAAAA^, AAAAACCCCA AC777A7A7A 6 0 

CAAAG7CAAA CTGAAACCAC GGWTTATGGA AAGAGGCAAG AV77A7GGG7 AACAGGGGAG 12 C 

aag^ggggc cagagccaa7 accaca77c7 gaacacagga gccacgggaa agagg7gc7g iso 
o.77c77c7g gcaagaccgg gg7gac7gga acgcac7gg7 cctac7ggca aacccagccc 2c0 
aacac7gag c 7c777c7agc a7ggac7cca 77cccg7ga7 7ggccaaggg agacc g77cc 3 00 

;:; :ntg?^.a7:gn for sec :j no : 222: 
;i; sequence charac7eris7:cs : 

(A) LENGTH: 342 base pairs 

:"5 : TYPE : nucleic acid 

(C; STRA-NDEDNESS : double 

;E) TOPOLOGY: linear 

• x i SEQUENCE DESCRIPTION: SEC ID NO; 222: 



^ — l ~ v: U \ j ^ n G C u N A G ^ A 



- - - w l- ^ ^ i ^ * o w- u^ ^- ^ ^- G 0 G G 0 G G G C 

- .-. ^ w . ^ ^ u AAG 700C7G GG G 77 A 0 G 7 G G G G G AAG AG 7 1". 

-u- .^^7G2AAC G7G7770CAG 0GAGG7GGGA GCGGGG077G 7GAC7GGGAC 

^^-0 GAGGGGGA27 7G7777TCCT 77 0 CTC7AGA GA' 

* ~ - - r i ^ ^- _ v.- ^ AA.* 1 . ^ A. G ^ ^ .AAG A G 0 AA_A. 2 7 G 

'1; INF270OATICN FOR SEQ 1C NO : 223: 

v i ] S E 0 7 E N OF 0 71ARA C 7 E R 1 S 7 1 r Q ■ 

:a; LENGTH: 376 base'rain- 
0 7YFE:_nucleic acid 

;0. 7 OF 2 LOGY: linear 
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GCACAGCCCC AAGAAG 376 
(2) INFORMATION FOR SEQ ID NO: 224: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 445 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 224: 

GTTGATAGAC ATTGGCATTG GGGTTGCTTC CACCTTTTGG CTGTCATGAA TAATATTGCT 60 

ATGAACACTA ATGTACAATT CTTTGCCTGA ACGTAAATGT TTTCATTTCT CTTGGGTATT 120 

TAT CTAG AAA TGAAATTGCT GTATGTTAAC CCTTTGTTTA ACCTCTTGAG GAACTGGCAG 180 

ACTTTTCCAA AGCAGCTGCA CCATTTTAAA TTCTAACCAG CAGTGTTTGA GGGTTCCAAT 240 

TTCTCTATAT CCTTGGTAAC ACTTGTTATC TGCCCTTTTG GTTAGAGACA TCCTAGTGAG 300 

TGTGAAGTGG CATCTCACTG TGGTTTTGAT GTGCATTTCC CTGATAGCTA ATTGTGTGGA 3 60 

TCCCTTTTGC TTTTAGTGGA ATGAAATATC TGGTAGTCTC GTATGCCAAA CTAAAGCTAA 420 

AATTAAAATG ACTCTGCATG ATGGA 445 
(2) INFORMATION FOR SEQ ID NO: 225: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 403 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 225: 

TGCTCTCGGG ACAGTTTCCC GGGCAGCTCC TGGCCAGCTT CCAGCCCAGA GTCCTCAAGT 60 

CCAGGGCACC TTGGGCCCAG CGCAGGCAGA ATCCGAGGTG GTCCTGGCTC TACCCTGGGC 120 

CTCCTACTCC CCAGCACCCC TGGAGGAGGC AGGGGCTCGC CGCCGCCGAG GCTGCCTGCC 180 

CTAGGCCCAC CTCTGCATGC TGCTCATGGG GCCACCCTGC GTCGTGGGCC CTCACTCTGC 240 

CTAGGGGAGC TGGGCCAGGC A C TAG CC TIT GCCCAGGGAG GTGGGCCTCA GGCTGCCCAG 300 

GTGCCTGCAC GCCAGCCGGG CTTCTCTGGG GCCTCCCCGT CGTCAAGCCT ATATCCTGTC 360 

TGTCCCCACC CCAGCTGTCC CTTGCCAGGG GACTGGCATA AAA 4 03 

(2) INFORMATION FOR SEQ ID NO: 226: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 440 base pairs 

(E) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 
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^5"-"^ INSCRIPTION: SEQ ID NO : 22 6 



j. 1AGG1 TTCT 
TGAGTTGGCC 
GGCTGAGGCG GGCGGATCAC 



^ C CAT v 



i ~AL 



A G G T C A G G A GTTCGAGACC 
.XATC^ CTACTAAGGA TACAAAAATT AGCCGGG7G" 
CAGGAGGCTG AG G C AG GAG A TTTGCTTG-J 
-^-.'^^^ Al-AGGTTGCA AGTTAGGCCG GGr^ZTGCGCZ GTTTGTACTC CAGCCTGGGC 

INTCRMATICN FOR SEQ ID NO: 227; 



: ; ^.icir,,' ^ ^ o case ^airs 

<.£ > TYPE: nucleic acid 

iC) STRANDEDNESS : double 

(D; TOPOLOGY: linear 

SEDUENCE DESCRIPTION: SEC II 



n. ^- ^ ^ ^ ^ . 



n ^ o o w v 



^ , r Das t 
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TCCACACTAT TTAACAGGAC TGTGGCAAAA TAGCTTTA 27 8 

(2) INFORMATION FOR SEQ ID NO: 229: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 425 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 229: 

TTTTTGTTCC CAAGCCTTTG TGACTGACTT TAAATCCTCT CACCTGCAGA ACAGAGATGG 60 

CTTCAAAGTG GGGAGTGAGG GAGTGAGCGA GGACCCTGGG CTGAGACCTG TTTTTCTTCC 120 

ATTTCTGCTG TGGCTTCCCA CAGCTCCCTG GTTCCACACC AGGCCCTGCT CTGCCGCAGA 180 

AAATGGATTC CCAGGCCACA GAGCTGTCAG GCCTTTGACT TTGCAGAGAC CAAGCACCCC 240 

AGAGGCTGTG CGACASGGCT AGTCCCTGGT GGGGCGGTCT GGGGCATGGG GGGCAGGGAG 300 

ACTKGGAGAT GGGGAGGGCG TTGAGAATCC GGGGGG7CCT GGATACTTGA CAAATTGGCT 3 60 

CAGGTCTTAG CTYTGGYTGC CCCACTGATT GTGTTGCTTG GCAAGGTGCA AGTYTTCGGC 420 
TGTTC 

(2) INFORMATION FOR SEQ ID NO: 230: 



25 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:230: 

TTGGAGGATG TGCTGCCCCT CCTGCAGCAG GCCGACGAGC TGCACAGGGG TGATGAGCAA 60 

GGCAAGCGGG AGGGCTTCCA GCTGCTGCTC AACAACAAGC TGGTGTATGG AAGCCGGCAG 120 

GACTTTCTCT GGCGCCTGGC CCGAGCCTAC AGTGACATGT GTGAGCTCAC TGAGGAGGTG 180 

AGCCAGAAGA AGTCATATGC CCTAGATGGA AAAGAAGAAG CAGAGGCTGC TCTGGAGAAG 240 

GGGGATGAGA GTTCTGACTG TCACCTGTGG TATGCGGTGC TTTGTGGTCA GCTGGCTGAG 300 

CATGAGAGCA TCCAGAGGCG CATCCAGAGT KGCTTTAGCT TCAAAGGAGC ATKTTGACAA 3 60 

AGCCATTKCT CTTCAGCCAG GA 382 

(2) INFORMATION FOR SEQ ID NO: 231: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 398 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 231: 
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SEQ 



* - a 7a aa.cact.ac acatattcat acaaa: 



_* . ^ ^ ^ _ w ^ ^- ._, _a _ * ^ GOT A J . ^ ^ A AAAAL-A.A 



^ _ > ^- ^ w — . ^ w . w - . _ ^ ^_-..-.-A. . . Z ^ w- G 3 3 C . _ A G ^ ^ _ AG 
* - - - - ^ - AA G AAT ZZ AA 3 Z 7 3 CT Z Z ZZ I AT CT ~ 0 C ~~ ~ ~ 



^ ^- - o ^- ~. o 

- ^" - _7 o v.- O ^- 



360 
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GAGGGTGGAG AATCGYTTGA AGCCAGGAGG CGGAGGZZGC AGTGAGCGGA GATGGGGGCA £0 

TTGGAGTGGA GCZZGGGCCk GAGGAAGGTT CGTTCTCAAA AAAGTTGGAA ATGTGTTGGG 120 

AAGTAGGGGG AGGGCAAGGT TAAAAGCTAT GCAGGTGTGT CAATTAGACT TGTTCCAACT ISO 

TGAGAAGGTG AATTTTGCAT G T AATT G AAA TGTTGGAGAA GAAGTGTGGG AGTTTCATAA 2^0 

GGGAGTTT7T AGATGCCAAT AGATTGCAGA 7AA.CCATATT GGTTAGATTA GGGGAATGAG 

CATGGGATAG GTGCGTGCCA GTTGGTAGGA TAGCATGAGG AGGTTTCAAA AGTAACCSCT 

TTAACGGTTA TGTCGAGTAT TTGGTAAGTA AGCAAGGT 393 

;2; INFORMATION FOR SEQ ID NO : 22 2: 

(i) SEQUENCE CHARACTER! STI OS : 

'A'l LENGTH: 272 base oairs 
« B ) TYPE : nucleic acid 

(C) STRANDEDNESS : double 
(D. TOPOLOGY: linear 

(xi; SEQUENCE DESCRIPTION : SEQ ID NO: 222: 

GGGGOTGCAG ACTGAGTTAT TTTATTT TGC 7ATTTCCAGT TTGAAGCTAO TATCATGGGC 60 

GT7TAGAGTT A T A C AAA T G A CACTTACAAA AAATAAAACA CCAAGAOAC0 CAGAGTGAGA 100 

-«-A^-_GG GGAGGGGGGA GGOTGGGAGO AGGGGGGGGC OGGCGGYTCA C0GCAGGGCT ISO 

A-GGGGCTCTG G0CCAGGTGT ZGGZAGZZZk GG 2~2 

(2; INFORMATION FOR SEC ID NO: 233: 

(i; SEQUENCE CHARACTERISTICS: 

( A . L^N GTH : 2 6-* base c: a i r s 
(E; TYPE: nucleic acid 
(C; STFOANCEDNESS : double 

(D) TOrOLOGY: linear 



^ * ■„ . 
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(A) LENGTH: 217 base pairs 
(£) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 234: 

GGCCAGGAGC CAGAGGGCCC CGGGGCCACC CCTGCCGGGG AACGTGATGA CCAGAGTCCA 60 

GACAGTGTCC CAGAGAGGCC GCGGCCCGCA GACCGGAGGC TCTGTCTGCC CTNCGTGGAC 120 

GCCTCGCCAC TCCCAGGGAG GACGGCCTGC CCGTCGCTGC AGGAGGCCAC GCGGCTCATC 180 

CAGGAGGAAT TTGCCTTCGA TGGCTACCTG GACAATG 217 
(2) INFORMATION FOR SEQ ID NO: 23 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 221 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235: 

AACTTTAAAG TTAGGATTTT AAAATATTTG TAACTGGCTA AATTTTAAAG TCGTGACAAA 60 

TAATTACTTA GGTTCAGAAA TATACACACA CTTACTCTTT AGCCAGTTTC TTTCAAGGTN 120 

TTACTGTCCC AT C AG AT AT C TAGCCATTTK CCTTTGCAAA TTACATACCT TCTTAAGAGT 180 

GTATTTTTAA GATTATTACT TATGCTTTAT GATGATATAG T 221 
(2) INFORMATION FOR SEQ ID NO: 236: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 221 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 236: 

ATAAATG GGT TTCTCACTCC TTAGGGACAC GATTGGAAAC AAT AC AT CCC ATGAACACAG 60 

GTGAATGTCC CTGGTTATCC CTGAGCTGGG CAGTTTCACA CAATCANTTT TNCTCTGAGG 120 

CCAAAGTCTG TGGTTTGATC ATCTTAGCAG CTTCCAGAAC AGAAAGTAGG TTTACTTTGT ISO 

CTCCAAANTC TNATTCTCGG TGCTCAAAGA AGAATGACCT G 221 

(2) INFORMATION FOR SEQ ID NO: 237: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 251 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 237: 
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TTTTGCCATG 


TTGGACAGGC 


TGATCTCAAA 


CTCCTGGCCT 


CAAATRATCT 


GCCCAGCTTG 


60 


GHCTCCCAAA 


GYGCTGGGAT 


TACAGATRTG 


AGCCACTGCA 


CCCAGCCTGA 


CATGCCATAG 


120 


TTTCAGCATT 


TTCTTGGGCA 


ATGATCCAAG 


CTGAAGGCTG 


GTCTGAGGGA 


TCTSAAGAAG 


180 


CGTATGAGTT 


GGAAGAGAGG 


GACAGAAAGG 


AAGAAGACAT 


GTGAAGAGAG 


AAAAGGAAGG 


240 


AAGCTAGCAG 


AGGAATGCCC 


TCCAATAGAG 


ACTGCTGCCT 


GAAGCTCAGC 


CCCTCTGAAG 


300 


AT AG G TAG GC 


CAGGCTGGCT 


TAGCTGAGGC 


AGTGGGTTAG 


ACCAGCCCT 




349 



(2) INFORMATION FOR SEQ ID NO: 241: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 233 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 241: 

GTGCAGCGGT CTGCCTTCAT CTTTTAATGG CCGGTGCGGT ACAGTTAGTG GACAGACGGG 60 

GGATGGGACA CAGCAGGGGT GAAACAGGGC AGTCACAGCC GGGGCCGGGG ATCTGGAAGC 120 

GGGGGCGGTC CTCCCCCTGG AAACACCGTN TCTGGAAGGA CACCCTTAGG ATCCCCTGAC ISO 

CTCARGGTGC CACCCACACG GGCCTGG7GT TCTGGGAGGC CCGGCTKGAG TGA 233 
(2) INFORMATION FOR SEQ ID NO: 242: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 372 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 242: 

ATATGTACTA CATTTGGTGG AAT A C G CAT G TACAATTCTT C AAAAAT A G T AAAG AG C AAA 60 

A C AAA C AAAA AATAGTAGAA GCACTGGAGA AAT AC ACT AT GGCATAAACT AGTTACGGGT 120 

GGGATGTCAC ATGGACCATA TCTACACTCT GTGGCAACCT TCTTACCTGA CTCCAAAGGA ISO 

TCAGATAATC AAAC AG G AAA TTATGGTAGG AAAT C A G AAA ATTGAAGTAT GC ATT CAT AT 240 

CCTAAGCATT TTATTTTAGC TCAAAATATA AAAATATTCA TCAGTTAGCC AAGCTTTTGN 300 

GATGAGAGAT CATAGCCTCC TCTTTGATAG GGGGTTTCTT GGGTTTCCTT GATTTCATGT 3 60 

TTCAGAGTTT TT 37 2 
(2) INFORMATION FOR SEQ ID NO: 24 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 256 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
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AGGGGGTGGG GCGGAGGGTC AGGAAAGCAG GCTCAGCTTC CAGGGTCAGG GAGTTGTGGG 120 

CCCAGAGGGG CTGTCACAGT GGATGCACCC TGCCCCCTCC CTCGCCAGAC CCGAGGGTAG 180 

GGCAGAGGCA CCTCCTCGNC AGCCTNTGGG CTGCACCCAC AGGGAATNGA GGGGAGGGGC 240 

ACCATTACCA CTGGACCCAC C AAA GACCC 269 

(2) INFORMATION FOR SEQ ID NO: 247: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 297 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 247 : 

CTATTCAAAG TTTACTGACC TCCCCAGCCA GGCAGGCCAA CCCTTCCGAG C AG GGG AAA T 60 

GTC CATC TAG CTGCCCTCTG CTGGGTTGCA GCCTATGCCA TGAGAGGGTA CTGGAAGCAG 120 

GAGGGAGCCC TGGCTAGGGC AGGCCTTAAA CGCAAGGGAA GCTGAGCAGA GATCTGCACA ISO 

CTCAACCCCA TTTGATATTC TTCTCCTCCT CAGTCATGGC CAGCGTGTTG GTGACTAGAC 240 

CGGTGCCAAT AGTCCGGTTG CCATCTCGCA GGGTGAAAAG ATGGCCTTTC TCTTAAG 297 
(2) INFORMATION FOR SEQ ID NO: 248: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 281 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNES S : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 248: 

ACAACAAGCA CACCAACTAT ACCATGGAGC ACATCCGCGT GGGCTGGGAG CAGCTGCTCA 60 

CCACCATTGC CCGCACCATC AACGAGGTGG A G AA CC AG AT CCTCACCCGC GACGGCAAGG 120 

G CATC AG CCA. GGAGCAGATG CAGGAGTTCC GGGCGTCCTT CAACCACTTC GACAAGGATC 180 

ATGGCGGGGC GCTGGGGCCC GAGGAGTTCA AGGCCTGCCT CATCAGCCTG GGCTACGACG 240 

TGGAGANCGA CCGGCAGGGT GAGGNCGAAG TTCAACCGCA T 281 
(2) INFORMATION FOR SEQ ID NO : 249 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 383 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNES S : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 249: 
AGCGCATCCA CACCGGGGAG CGGCCCZACC CCTGCTCCTA CTGTGGCAGG AGCTTCCGCT 60 
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(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 252: 

CCTGAACAGT CTGTTTCATT TGACTGTTTG GGGGTCTCCC AGTTTAAGCA AGATATTTAA 60 

GCCTTATTTC TCTTGGCATG CTTGGATTCC CCAGTAAAAA AAACTCCTGC CCTGGGCTGA 120 

CAATCAAAGT TCTGGGAACT AATATGGATA AGCAAGCTGG AAATGGAGAA GGCTATTCAC 180 

TGTGCCTGGG TCCTACTGTT TTCTGGNTGG GAACTGCTTT TCCATTAGGC CTGGTGTGCC 240 

CTGGAAGGGA NGAGCCTCTT GCAGAGACTA CAATCTTGGA TGGGTCCTTT GCCAAGTTTG 300 

AAGGTAGGAA CCCA 314 
(2) INFORMATION FOR SEQ ID NO: 253: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 293 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
CD) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 253: 

GAACACTCTG CTCCAGCCAA GGTGGTGAGG GCAGCTGTTC CTAAACAGCG CAAAGGCAGC 60 

AAGCCACAGT CCCACAAGCC TCAGCCTACC CGTAAACTGC CACCCAAGAA GGACATGAAG 120 

GAACAGGAGA AAGGAGAAGG GAGTGATAGT AAGGAGAGTC CAAAAACCAA ATCAGATGAA ISO 

TCAGGGGAGG AAAAGAATGG AGATGAGGAT TGCCAGCGAG GCGGGCAGTA GAAGAAAGGA 240 

AACAANCACA AGTGGGTTCC ATTACAAATA GACATGAAGC CTGAAGTGCC CAG 29 3 

(2) INFORMATION FOR SEQ ID NO: 254: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 413 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 254: 

CTTTTTCTTA ATATATTAAT ATTTACCAAG GCAAGACAGT GATTTATGGA CATTTAAATT 60 

ACTTTAGCTT TGTTCTGCTG T T C T AAAA C A TTGTGTACTG TCTGATAGAC T TTT AAAAAA 120 

CAGTGCTTTT CCAGGATGAT TTATGATATG CAGTATTGTT TATAGATGCC CATGGCTTAA ISO 

C C TIG AAAA G TCAATTAAGT GACACAATTA AG AG A GAT AT GAATAGTGGT AGAAAAAGCA 240 

TGTACTCTGG ATAAGTGGGG GTAAATCTAG TATTTGTTAT TCCTGTCAGT AATATTGTCA 300 

NTAGTATTTT TTAGAAGGTT TAATITTTTT ATGGGTTATA AATTCATGTC ACTCTTCTGC 3 60 

AATGGGTACC ATCAGTGGGA ATGCNGGAAT IATCCATGCT TTGGGGGTTA AAA 413 
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C;. INFORMATION FOR S£Q ID NO: 255: 

(i) SEQUENCE CHARACTERISTICS: 

■ A > LENGTH: 376 base nairs 
<£) TYPE: nucleic acid 
< C; STRANDEDNESS : double 
1D1 TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 25 5: 

GGGICCAGGG G A. G AA T C AA T ATATCTAGTA TAGTTTATAT TTGTACCTTC TCTCCTTAAG 

AGTTACAGTG AGTGACTCTA CTCCTCAAAT GGAGCACCTC TCTCCAGGAG AGTAAGAAGA 

TCACATAAAT AGAAAGTGAG CTTTGGACTC TAACAGACAT AGGTTCATAT TCAACTCTGC 

TAC77AATAT COA.TATTGGT TTGAGTTATT TAACCTTGAC AATCCACACT GTAAAATGGG 

.AAAYAATAA ATACCCTCCT CCCA.GAAGTG TTA.CAAAGTT TA.TA.TGAAAT AATGTG GTTA 

AA-AAGCT'-'GG TACATAGTAG GAGOTTAGTC ATTGTTTATT TTCTCCOTCA TAG C CAT A OA 

* ^ . ^ ^ v_ ^ A 1 ^ ± ^ 

(2; INFORMATION FOR SEO ID NO: 2 56: 

(i; SEQUENCE CHARACTERISTICS: 

:. A ■ LENGTH : 2-1 base r airs 
(3. TYPE: nucleic acid 
(C, STRa::DEDNESS : couble 
( 0 ; TOPOLOGY: linear 

(xi; SEQUENCE DESCRIPTION: SEQ ID NO: 256: 

^- . A ^ A ^ A - G G GOT OA CTATK TT G 0 C 0 AG G C T G G T C 0 T G AA CT C C T G AG G T A G G AG C- AT 0 G 

u ' — ^ ^ - - ^- C A ^- A C AG A G GTT G C A GT G A GZZG A GAT 0 A. C G C OA ZZGZ/-. 0 T 0 0 T G 0 CT G 

w ^ . ^ A A - A G . G A G A 0 T 0 T G TOTT AAA. 0 AA AA C AAAA 0 AA AAAAA. G G C 0 A. G G 0 G 0 A G G G G 

- * ^ ^ _ . — _ — _ .~ w - * ~ ^ -~ - Gu-^v. AA. G ^ ^ 0 G T G G A. T I A- 0 C T G A. G G T 0 A. G 

A *: LENGTH: -IS base rairs 
^U TYPE :_nucleic acid 

T:p:L0GY: linear 
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TCCTGGCTAA CACAGTGAAA CCCCGTCTCT AC T AAAAAT A CAAAAAAATT AGCTGGGCAT 240 

GG7GGCACGC GATTGTAGTC CCAGCTACTA GAGAGGCTAA GGCAGGTGAA TCGCTTGAAT 300 

CCAGGAGGTG GGGGTTTCAA TGAGNCCGAG ATCGTACCAC TGCACTCCAG CCTGGGGCAA 360 

CAGAGTANGA CTTCGTAACC CCCAACCAAC CCNCCAACCC CCCGCC 406 
(2) INFORMATION FOR SEQ ID NO: 258: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 157 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 258: 
GAAAAGAAGG AAGGAAAGAG GGGAGGGhGG GAGGAAAGGA GAGAGGGAGG GAAAGAAGGA 60 
GAAAATGCTG GAGGAAAGGA GGTTGGTTAC ATGATTTCTC TAATGGCAAT GAGCTGCTTT 120 
CTGGATGAAA TACAGAATCA GAGCGAGACT CCGTCTC 157 
(2) INFORMATION FOR SEQ ID NO: 259: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 361 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 259: 

AAGCAGATAT AAATGGGACC ACTGTGAATC AAAGGGGAAA AATTCCAGGA AAAAAAAAT T 60 

CCAATAGCTT CACAGTTTAA CTGAGGTTTT GGAAAAACTT AAGTGAATTC AGCTGATGTT 120 

TGAAATATCT GTCTACATTT AATTAGATGT GTTGTATTTA CCAAGGAGGC ACAAATATGT ISO 

AGTTCTGTAG ATTTTAATAC TAACTTTTCC AGTAAGAAAA ATAATACCAG GTGATTTCAA 240 

AAAGGGCAGT GAT C TAT AAA CACTCAAAAT GCATCTTTGA ACAGGGGAGC AGAAATAGCT 300 

AATTTAATGA AAA C AAA C C T TAAGCACTTT ACTTGGCTTC TAATAAGGCA TCCCAAGAAA 360 
A 

(2) INFORMATION FOR SEQ ID NO: 260: 



361 



(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 349 base pairs 
(3) TYPE : nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:260: 
CAATACATGT ATA C AG TGI A CACTGATCAA AlAAGAGTAA T TAG CAT ATT TATCACCTCA 60 
T7TCTTTTGT GGTGAGAACA TTTAAAAT CC TTTCTTTTTG CTATTTTGAA ATATACAGTA 120 
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(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 263: 

TGTATCAACT CAGAATTTCC AGAGAGCTCT TCCTGGCTGA AAAGATGTCC AAG GAT CATC 60 

TCCGGAATGG AAGAGGTGAG GCCTGTTAGC TTGTGGGCTG CCCAATCCAT CCAACCCTTG 120 

GCATTGGGAT CAATGTTGAT GAGGACAAGA CCTTCAACAG TGTCCGGGTG GTTAAGAGCA 180 

TATCTCGCCA GGATGTAGGC TCCAGCTCCA ACACCAACTC CAATTATTGT AGAGAAATTT 240 

AGGTACTGCA GGACGCAAGG GATCATGTCT GCAAGCTGGT CCAGAGATGG GTACTGATAT 300 

CCCAAAGGGA ACACAGGGGC TCCCTCTTCC ATTCCAGGGG CATCCACATG GACCCGCACA 360 

AAGTTCTGAA TGATTTCCTG CATGTCCTCG AACTKGAACA GTGGCTGGAG GAAAGATTTA 420 

TAGTTGAGTC CACATCGGGT AGGTAAG 447 
(2) INFORMATION FOR SEQ ID NO: 2 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 317 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 264: 

TTTTCGCTGT CAACAGACAG TTTATTCTAT ATACAAACAC AATTTTGTAC ACTGCAATTA 60 

AAT A G AATG G AATGAGCGCT CCTCCGCATT CCTCCCCGAG TGACTGGTTT GGCCGCCGGC 120 

CACTCCATCC CCGAGTGGGA CTGGACCACG GCCCTGGNTG CTGCCACTGA TGTTGGNGCC ISO 

TGCACCCCAC GTCCCTATGC CCGAGGCGCA ANTCTGCTCT CCCGGGGACC CCAAGNCTGG 240 

NGCACACGCG GGGAGGGGGG GGCCATGGAG AAGGCACTGC AGGGAGCACC AGGCAGAGCC 300 

GTGTTGAGGC CGGCCGG . 317 
(2) INFORMATION FOR SEQ ID NO: 265: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 270 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(>:i) SEQUENCE DESCRIPTION: SEQ ID NO: 26 5: 

GCAGAGCAGG TGGAAGTGAT CAGGAACCAT AGTTGACAGT TCCAATCAGT AGCTTAAGAA 60 

AAAACCGTGT TTGTCTCTTC TGGAATGGTT AGAAGTGAGG GAGTTTGCCC CGTTCTGTTT 120 

GTAGAGTCTC ATAGTTGGAC TIT C TAG CAT ATATGTGTCC ATTTCCTTAT GCTGTAAAAG ISO 

CAAGTCCTGC AACCAAACTC CCATCAGCCC AAT CCCTG AT CCCTGATCCC TTCCACCTGC 240 
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AGGACGGCGA CAAGCCCCGG GTGCTCTACA GCCTGGAGTT CACCTTCGAC GCCGATGCCC 240 
GCGTGGCCAT CACCATCTAC TTCCAGGCAT CGGAGGAGTT CCTGAACGGC AGGGCAGTAT 300 
ACAGCCCCAA GAGCCCCT 318 
(2) INFORMATION FOR SEQ ID NO: 269: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 422 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 269: 

ACATGTCTAT TCAGGTCTTT TGCCCATTTT GAAATAGCAT TGCTTGTTCT TTTGCTGGAT 60 

ATTAACCCCT TGTCAGGTGC ACAGTTTGCA AGTTACCTTT TCTCATCCTA TAGGTTATCT 120 

CCTCACTCTT GATTGTTTCT GTTGCTGTGC AGTAGCTTTT AAGTTTGGTG TAATACCATT 180 

GTGTTTTCTC TGCTGCCCTT TTAAGTTTCA CTGGGTCAAA AGTTTAAAAT TTGTGAATTC 240 

CTATATTTTT AGGGCAATTC TCCTGCCACT GTTGGAATTA TGCCTCAATC TATGCAGTAG 300 

AATATTAGTG TGAAATGCTT CTGTACCAAT GGAGATGATG CTGGATGGTC TCTATCATAA 360 

ACCCATACCT CATCAACACA AACTGCAATT ACACAAGGGC TCTATATCAT GGATCTCCAT 420 

TT 422 
(2) INFORMATION FOR SEQ ID NO: 270: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 37 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 270: 

GAAGAAGAGC CCAGACCTAG GGGAGTATGA TCCACTTACC CAGGCTGACA GTGATGAGAG 60 

CGAAGACGAT CTGGTGCTTA ACCTGCAGAA GAATGGAGGG GTCAAAAATG GGAAGAGTCC 120 

TTTGGGAGAA GCGCCAGAAC CCGACTCAGA TGCTGAGGTT GCAGAGGCTG CAAAGCACAT 180 

CTTTCAGAAG TCACCACGGA GGGCTACCCC TCAGAACCCC TTNGGGGCCT GGAACAGAAG 240 

GCGGCCTCGT CCC7GG7G1C AT AT GTGCGC ACGTCTGTCT TCCTGCTTGA CTTTGGGGAT 300 

CT C G AT G AT C CTGGTGCTCC TGTGTGCTTT CCTGATCCCC TGTCCTCCCA GAGATCTTGA 360 

CAGAACTGGA GCCGCA 37 6 



(2) INFORMATION FOR SEQ ID NO: 271 



WO 92/0035 3 



# 



PCT/LS92/05I 



-177- 

■~ ; nucleic a— > 
----ACGTT =CCTTtc-tt r -„ 

-ccrrrrrcc — c - c — — ^^c,- tatctatact 

..TAGCCTCA CCT-TA^G- --o 

-ccacgtt GArcAicAAx tt^cgs- iAlGACTGTK 

CA-TGT-C 4 — „ T — rTATGA AAAAGAATCA 

— ^GTAG GAAATTTcrT r^rv,- 
rATGriGCTA TCA^rr... _ -^A, T . CTGATGGGAA 

-CAGGAGAG G7ATA.ACGCT -~ G ^ 



^GACTCCIC 

CCCACTAAGA 

GGCACGATrG 

AGTTGGNCAC 

ATTTGAAGTG 



I^OR^TIO:; FOR SEQ : D no: 2 72- 

(i) S£ ? U H^ CHARACTERi s ^ rs . 
— ' LENGTH <o/ v " b • 

•B. TYPE: ;.;^-_ D&Se .^irs 



—r^OCY; linear 

rr : --l:^ E 0ESC? - :r ^ se, a M: , 7 , : 

SAGTCGGA 'CZC1ZG'----~ . 

~~ CiGGG -GGAGTGCA 

— ~-.C GCCTCCCAGG — 

—.A TIAGAGGCAC CrCGCAGGAC !Cr ^., „ — 

1 '-AGAA.rrr"" ^aa^- - * - 

- ' — ACr^^^AA-- 

v„ .CAGat,--- — ^ 

— "-^oAATG G~~^a~- 

^ZAAAGTTT ' 

— ASGrrrA ctga 



^ *" - - C C 2 



--^ACT GGC 



- - * CCAC 



ATGGTGCAA.T 
CAGCCTCGIA 
^TTTXAGTACA 

- - * AAGGG7T 
- vr - - - ^AAAG 



60 
120 

I C 

240 
3 CO 
34 5 



• * 
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259 



120 

180 

240 

300 

348 



TGCTGCAAAT AATCCCACG 
(2 ) INFORMATION FOR SEQ ID NO: 274: 

<« ^^^^ 

(${ TYPE: nucleic acid 
C (C) iTRANDEDNESS: double 
(D) TOPOLOGY: linear 

SEQUENCE BMBffl.: SEQ I» "^'V^ mcATICC A «> 

.cggagxxgt ccggattgta agxgaaaggg tcgaaxax * o-^n 

n- tt r ttt r TTT GGCTGGGATA GAGGGGTLAA ^ 

~ T " «™ tttttagtgc attagxgggg 

tot ca=acc XACcaGcxx ™- tccaataggc 
aggagggxgg gcagagaxaa — tcttccaggt agcxgaaagg 

GTAGTATCCG GACAGAGCAC GTTTGCAGAA b 

ri-GX GACGXACXCX GGGTXAGGTX AGGACXXGCC CTCGTGGT 
(2 ) INFORMATION FOR SEQ ID NO:275: 
n SEOUENCE CHARACTERISTICS: 
C ° (?) LENGTH: 396 base pairs 
TTPE: nucleic acid 
C STRAND EDNESS : double 
(D) TOPOLOGY: linear 

<■» " E DESCM ;^;rG~ -AAACAGG AAACXACAAG 
~ — GGGCSGTG iCCGCIGCI C TCAGGCXGCC 

TGCCCCTTCG CCCCCAGGTC ACG G ^ 

cagtgtggac gxgccxgt g G^acc ^ ggtctccccc tgictotca 

GGACAXGCAC GXXGCCXGXA ****** 
GGGCTGCTCC TNTTGGNCCA G^ - ^ 

CCTGACTATT CAGCTCACAG TG.CCACCCA 
GTCTGTTAAC TGGCAACATA CTGGCAGCCC ATAACT 
(2 ) INFORMATION FOR SEQ ID NO: 276: 
, n cmUENCE CHARACTERISTICS: 
U) To WkGTH: 3B1 base pairs 
c'l) T^E: nucleic acid 
r straNDEDNESS: douole 
(D) TOPOLOGY: linear 

C*i) SHQ-CE DESCRIPTION : SEQ ID ^ CCCGNCCCA C «> 

»rrr-GCGCA AGGGGGCGAG CCCGGGCAGC CC-CCAA . 

GGTGTCGGGG AGCC.GCGCA TGGC GGCCCA GGCTGGGGAA ^0 
CCGCACCCAC CGCCGCCCCA GCAGC^C- 



60 
120 
180 
240 
300 
360 
395 
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GGTGTACCTC TTGGCTGGCA AAGCCAAGGC CAGTGGGNAC TTGTATAAAT CACATGGGTA 180 

TGTTCTTGGT TCAGTGATCT TGGAGTGATG ATGGTAACTN ATGAACAGAG AACTTTYYAG 240 

AACTTKGGTC CTGTCTTCCT CCCTGAACCT AGACAAGTTT CACCCCTCCT CCTGTACCCA 300 

ACCCCATT 308 
(2) INFORMATION FOR SEQ ID NO: 280: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 402 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 280: 

ATTTTAGCAG CTTTCTTGAA ATTTAAAATA TATGTGTAAG TATCTCATTT AT AT GC ATT T 60 

CTAGTTTCTT TATACAACAG AATAACTTCT TTTACATCAA ATTTCTGAAT TTGACTAAAT 120 

TTAGAAATAA TGGAATCTCA TCCATTAAAT ATAGTCATAG AAGGAAGGAA ATATGAAAAT 180 

TAGGATTTCA GATGTTTGAA CATAAAAGAT AATTTTAAAC ATTGTCAGTA ATCTATTTCT 240 

TTTTTTTTTC GAGACGGAGT TTTGCTCTGT CACCCAGGCT GGAGTGCAGT GGCGCGGTCT 300 

TGGCTTACTG CACCCTCTGC CTCCCAGTTC AAGTGGATTC TCCTGCCTCG NCCTCCTGAG 3 60 
TAGCTGGGGT TACAGGGGCA TGCCAACATG CCGGGGCTAA TT 
(2) INFORMATION FOR SEQ ID NO: 281: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 313 base pairs 

(B ) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 281: 

GAGAATCCGT CTTAAAAAGA AAAAAAGAAA ATTATAGAGG GAGATGAGGT GGGACAGAGT 60 

CTGGCAGTTC ATCAGGGGGA CTGAGAAGGT GGCATTTGGA GGAGAGGAGG CAGTGAGCTG 120 

TGCAGTGTCC AGGCAGCCAC CCTTCCCAGC GGCCACCATG ACGGTGTCCT CATTGCTTTA 180 

ACCATTAGTA AT C ATT C ATT CATTCATTCA TTTATCCGAC GTCAGCTGGA GG^CCTGCCC 240 

G^GGGGCAIG CGCTTAGATT TNGGAGGCCT TCCGGGATGC TTGCGCTCCA ACGGGGGAAG 300 

GCCGACTTGG GCT 313 
(2) INFORMATION FOR SEQ ID NO: 282: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 217 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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;xi) SEQUENCE DESCRIPTION; SEC ID NO: 232: 

TGACCTCAGT TGATCCACCC ACCZZGGCC: CCCAAAGTGC TACT ATT AT C GGCGTGAACC 6 3 

ACCATGNCCA GCCGAAAAGC TTTTGAGGGG CTGACT7CAA ATCCATGTAG G G AA G T AAAA 12 C 

TG G AN GG AAA TTGGGGTGGA TTT7CTAAGG ACC7TTCTAA CANATGGCTA TAATNTAAGG ISC 

GGTTTAGGGT CCTTTTTTTT TTTTCAGGGA TACATTT 217 

(2) INFORMATION FOR SEO ID NO: 283: 

(i; SEQUENCE CHAJUACT EF.I S T I C S : 

(A) LENGTH : 327 base vzLrs 
(5? TYRE: r.ucieic acid 
(C) STRANDEDNESS ; double 
(D: TOPOLOGY: linear 

(xi; SEQUENCE DESCRIPTION: SEQ ID NO: 252: 

TACAGAGCGC TTTACT SCTG GTCCCATGGC GTAAAGATGT GGGZGGGZZZ G^ZAAGGCGZ -0 

AG OCT 00 ACT CTTAAGATGG GCACAGAAGG GCAAGAAGIA AGATGACGAG TCCCAGAATT 12 0 

.-.v;w-.--rA ! J ; .w A-GAGC : _AAG GCCTGGTOTG AGCAAGGGCA GCCCCCTGTC CCAGACACAG ISO 

^. A ^ ^ ^ o ^ i-wTiAL.^j G A C A. A G C C A A C G T G G G G G G A T 0 0 T 0 C C G G G 0 0 T G G G C C T 2 — 0 

^ ^ ^AA ^ . ^ T 0 C T G C A. G G A 2 C CT G 0 C ATT G T G CT C AAAT C A C AA C 0 ATTT TTT G CT T C C A 2 2 2 

— ^ — — - . l- . oC t ^L \^ ■ ^ A G T L: A. G T j 2 ~ 

(2: INF2RMATION FOR SEO ID N0:2S^: 



SEQUENCE CHARACTERISTICS : 
(A; LENGTH: 3^0 base ^airs 
(3; TYPE: -ueleic acid 
(C) S7RANDEDNESS : dcuble 
I : TOPOLOGY: lir.ear 

SEQUENCE DESCRIPTION: SEQ ID NC:2S-: 

v.- ^ — - - - - .~ - > ^ _ ^ A. A, A.A. A. - A. G . _ A* * C 0 A_AA_AA 

. - — * ^ ^ . „ ^ * - - — - — ^ ^ ^ _ T _ A. 0 A. 0 A„A G . G A_AAA. A. . A. ^- A. 

-w — — - - - - ~. ^ ^ A^ .^U-_A — ~ A. * * A. * * ^ w A. w A. - A. A. * - 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 285: 

GACATTCACG GAGGTGGGTT CGACCTCCGG TTCCCCCACC ATGACAATGA GCTGGCACAG 60 

TCGGAGGCCT ACTTTGAAAA CGACTGCTGG GTCAGGTACT TCCTGCACAC AGGCCACCTG 120 

ACCATTGCAG GCTGCAAAAT GTCAAAGTCA CTAAAAAACT T CATC AC CAT TAAAGATGCC 180 

TTGAAAAAGC ACTCAGCACG GCAGTTGCGG CTGGCCTTCC TCATGCACTC GTGGAAGGAC 240 

ACCCTGGACT ACTCCAGCAA CACCATGGAG TCAGCGCTTC AATATGAGAA GTTCTTGAAT 300 

GAGTTTTTCT TTAAATGTGA AAGATATCCT TCGCG 335 

(2) INFORMATION FOR SEQ ID NO: 286: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 399 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 286: 

GCACAATTAT TAAAAAGAGG CCACTTAAAT TCAACTCTCC ATGGATACAG TGTCTGTGGC 60 

AATGTTTAAT TAGAGATTAA AATTGAGGAA TTGAATAATT GAGGTTGCTA ATGAATTTGA 120 

AAACTCAGCA AAGCAAGGAG AGCTGAGCGT TTTTCCGACT TAGCTTTTCT TTCTCTAACC 180 

CTTTTCTCAT TTCCTACTAT TATCACATNT CTGGCCTTGA CTGCTGAGTT TATTACTACC 240 

CATAACCCTG GCCTAAGTGG AAA C AAAAAA GCTGTAGCCT CTTTGCTGAG CTCCTGGAGA 300 

CATTTGGTCT ATTGGATTTA TGACATGTTC AGAAGCTTGC AGTTGCAGGA GGCTGACAAT 360 

GATGAAAATG AGATATGNTG GGCCACCACG CTTTTCTGT 

(2) INFORMATION FOR SEQ ID NO: 287: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 294 base pairs 
(5) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 257: 
TTCCAGTTGA ATTCACCAGT GG AC AAAATG AGGAAAACAG GTGAACAAGC TTTTTCTGTA 
TTTACATACA AAGTCAGATC AGTTATGGGA CAATAGTATT GAATAGATTT CAGCTTTATG 
CTGGAGTAAC TGGCATGTGA GCAAACTGTG TTGGCGTGGG GG7GGAGGGG TGAGGTGGGC ISO 
GCTAAGCTTT TTTTAAGATT TTNCAGGTAC CCCTCACTAA AGGCACCGAA GCTTAAAGTA 240 
GGACAACCAT GGAGCCTTCC TGTGGCAGGA GAGACAACAA AGCGCTATTA TCCT 294 
(2) INFORMATION FOR SEQ ID NG:2SS: 



3 9 9 



60 
120 
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Ci;- SEQUENCE CHARACTERISTICS: 

.A; LENGTH: 5 91 base pairs 
■B - TYPE: nucleic acid 

C' STRANDEDNESS : double 
\D ' TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2SS: 

TCTACACATG AGGAAAGCAA GCCTCAAGCA AGGGGGGCCZ GATCOTTTCC CTGTTOCCTG 

TCTATTCOCT GTCTGTGGCA AAGCCCATTG CCTTGATTCT CTTCTCTTTA C7TTCATGTT 

GAGAAGTAGT TTCTTTOTGO AGTTTATTTA ATTTACTGGC AAAATGACGT ATTTTTTTTT 

C-.^AA^G-T T--AGGTAGAT ATTTGCTTTA TGCATGTAAT GTCAATGAAG TAGTCATAAG 

T u^ AA G AA A T G A C T G A. T A T AAAT C A T G T GTTCGACT A 0 A T A G T C T AAA T A T T T A G T A T 

TTGGTCATCC ATTTTAATAT GTTCAAATTC T G T T AAA C AA GNCATAGTCA CTATGTGAAG 

- - ~ . i - .A- A. G ^ ^ G ^ A T _ A T G A C TT T 



S EC I 
A 



INCE CHARACTERISTICS: 
LENGTH: 19S base pairs 
- E ■ TYPE : -ucleic acid 
; C 1 S x ?A_NDEDNES S : double 
;D; TOPOLOGY : linear 

; x : . SEQUENCE DESCRIPTION: SEO ID NO:2S9: 

:ttatattot aotttatttg gtaa^aactca gaa^agtaaca attcacatcc TCCCACCTTC 

* ^ * i ^ ^ o ^- A- A. l: AA G G w A G T TT G 0 A G AG A. C AA_AA C G G 0 T 0 TGGCGTGGGG A.T C A.T C C A. C C 
. ~ v_ ^ _ w ^ ^ - — ^ : _. ; w ^ L- o w ^ A. w u C A. ^ G G 0 T T G G C A. G T C A. G G C 0 T C T A G G C T G A.T T G 
- ~ -~ ~ .-^A - A. AA 

TYPE: nucleic acid 
A STF-ANCEDNESS : doub: 
[Z TCP I LOGY: linear 
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(2) INFORMATION FOR SEQ ID NO: 291: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 163 base pairs 
(£) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 291: 
CCTGGTAGGC CTGCTACACA GTCTTGCAAC GNCCCTCGTG CTTGGGCTTC TGCGGTGAGG 60 
CAGGGGAGTC TGCTTGTCTT AGATGTTGGT GGTGCAGTCC CAGGACCAAG CTTAAGGAGA 120 
G GAG AG CATC TGCTCTGAGA CGGATGGAAG GAGAGAGGTT GAG 163 
(2) INFORMATION FOR SEQ ID NO: 292: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 397 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 2: 

ACGGGAAGGT GAGTATGTNA GTATGTNTGC CAGACAATGG TGTTTCCATG TCAATGGAGG 60 

TTTCTCAGAG AGAGGTGATC TGGCTGGAGA AAGCTTAATC TGGTGGCAAT GGACAGGTGA 120 

CTTTAAGAAG TGGGGAACGA GGGAAGGAGG CCAGTTTGAA AATNATAACA AGGGTCCAGA 180 

CTCAGTGATG CAGCAGTGAC CATGAGAACA GAGCAGCTGC AGGTAGAAGA TGGAGACAGA 240 

ACTNGGGAGA TCTGGTGGAG GTAAGCCGCG TGGAAAGATG ATGTCAGGTT TATACCTAGA 300 

GGACACATGA TCCATTCACA AAGCCAGGGG NAACCTAAAG AGAAAACACT TAGAATTTTN 360 

GGAGAANAGG CTAGGGCTGG GCCTTAGACA TGGGCTG 3 97 
(2) INFORMATION FOR SEQ ID NO: 293: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 360 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 293: 

GAGGTAAAAT TT A C ATA C AG TGAAATCCAA ATCTTAAGTG TACCACTAGA TAAATTTTGA 60 

TAAATGCATT ATGCCTGGTC TTCACACACC CTTTTCAATA TAT AG AAAAT NT C C AG AT AA 120 

TTTATTTTGT TGTTTTTTTC ACACACTAAG TTCTAGACTT TTCCAGGTCC GAGGGAACTA 180 

TTAGGGGGGA AAGTACTTGT NAT AG T AAAA AAGATTTTAG GTGTGTTTGT TTTTAAGGTG 240 

CAGAAACACA TCGCAGATTT AAGGTCTGCA ATCTCTGCTT TTTGTTATTG TTCCAGTTTT 300 

GATCTCAGTG ACATTACAAG C AA G C A G AAA CACTCAGACA TGAAATGGCC CAGTGCCTGT 3 60 
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(2) INFORMATION FOR SEQ ID NO: 297: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 244 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 297: 



AGTACGGTTN NCGCTNAAGC TTGATNATCG RATTGCCAAT CTNCATATTT GTGTTAGAAT 60 

CATTTGTTTT TGTGTCTTCA TGTTTCTATA AGATAGGACC AATATTCTTT ATTGGGCTTT 120 

GATTTTATTT TGTAACTTAA ATGTATTAAG GCAATAAATG TAATTTT CC A. CTNAAAACTA 180 

TCATTATAGA TTTGGTTACT ACCTACTGCT CAGCAATTTT TTTTCTTATC AAAATTCTTC 240 

CTGG 244 



(2) INFORMATION FOR SEQ ID NO: 298: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 152 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 298: 
CCTGAACAGG TAATGAGAAA AATTTACACA CAAGTGATTT TGAAAACAGA ATGGGTTGCT 60 
TACAAATTAC AGGAAATGTT AT AA C A C AAA CCAGAAGAAT TCAATGGAAG GCAATAAGGG 120 
ATTCTGAAAT G AAAAT TATA AAA G TAT CAN GA 152 
(2) INFORMATION FOR SEQ ID NO: 299: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 374 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 299: 

CGATGTTTTT AATGTCATCA CACGTTGTCT C AAAAT GAG T GGTGGCATCA TATGTGCGGG 60 

AAATAAAGAT CTGGCTTTCT GTTCCCAAGT CTTTTGGTAC CAGGAGGTCA CTGATGCTAA 120 

CAAATTTCTG TTCAATTGGT TCCAAGAGCT CCAAAGCTGG TCTGATTTCC TTCTCAGGCT ISO 

CCTTGGTTTC CACAGTTGTA CTAACTATAG CAATGTACTT CCCTTGTGCT GCTACATTGT 240 

GCGCAAAGGA GATCATGCAG ACGTAGATAT CTGACTTTCG ATTGACTTTG GTTCTGTGGA 300 

ATAATGATCT GGCAGGAGTT GGCATCATTG GTCTTCTTTG ATGGGGGTGG CTGAGGGATG 3 60 

CAAATAACCT CTTG 374 
(2) INFORMATION FOR SEQ ID NO: 300: 
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APOLOGY- .„ fi 

c — -c GAGC ; ^r- GCA ^;; ; : — 

A7GCC ATG^C'— ^AGCGAC T^,__ 

^--GACaa " ^GCCCAGC- — - — — TG 

• ^ iCs ACMG A ^ r „„, * ^^AACGAG Cr- - ■ ~~ 

AAGG Arcrc _ — C -AGT GCCACAGr^ r .„ — GGTA 

-.-iGGag crCAGAACrc ArcT£c .;: CAiAGAG ^ ^GGAGGGAG 



( 



* SEO 



:o ■ 



LENGTH 



OTP, 



-Rand 



nu 



cle: 



, ££e Pairs 
■ c acid 



^ooeng; 



>ES; 



.1 bCC 



-OAA-j 



: ^o?r:o:, 
gagaa 



SEO 



^OG" 



ooogga: 



- AAA 



aoag 



^OAGT GG. 



•^oagac 



- CTATC 



^AGAOI 



oaa: 



■^AG.AN' 



?0: 



OAAAA 



ATGAA 
- AAA C 



CAGG 



■^GAG' 



-AT. 



10 N 



60 



300 
360 



1EV;7" 

00:' D ~ ' 



OlARj 



363 b 



:sr; 



00S : 



LOG: 



-OA 



deub 



near 



' -"OOGGG 
- A G G G ~ 
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" • r rCCCAACTAT GGAGGAAGCA 

^ccxn ****** taagtttctc ccccaa 

CAAG^CTATT AGAAACCTTT 

^ ***** DESCUrI1 " t G^ GCAGGGGTT G 
iT GC^S — 

ATACAC-GAGC AAOi 

CTCCCACACA TEA q&; 



360 
363 



CAATCCATCT 


60 


GCCATARCAT 


120 


TA.TGCA.CCCA 


180 


; g^GGCTIAGG 


2^0 


253 



:8F0B ^TIO B FOR ^ * 

® = ID ,,,.30.: 

^ SB *«« - SCU G ^ m CGGGGGCXCG 
TXnr** — " G c "" TO yvGAGGTTCT CCTC*** 

GGCT GG^ OCGGC ^ ^ GfcccTIMCT 

CGK GnTC. CG^CGC C^CX GG ^ GG GCftGG 

cctTG cG OT — « «^ r*»**° 

temecnB ^ ^ ^ — «* 
ni GGGCGTT OOCT^C 

xrr^ CCTCC ^= 

^.r T r C^GGCCIT GA^- 



CGCGATCTCG 
CC1CCCGGGT 

xtagtggaga 

GATCCGCCAG 
CTGTTTTTTA 

ticcttaacc 
ttaagg 



60 
120 
1B0 
2^0 
300 
360 
416 



TGGCTTGGCT 
^/^rTGGGATi 
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A. - AG G.^ u- A ^ . w Aw C OTG C CCAG C G GAGA CTTTA.TTGTT TTAA.TTA GAA A.TTTT A. G GI A 1 £ G 

AGTGGTTTGG GCACATTTAT ATTTGCACAC TTGTGGTAGT GAG 22 2 

(2) INFORMATION FOR SEQ ID NO: 306: 

(i; SEQUENCE CHARACTERISTICS: 

iAi LENGTH: 16 9 base ^airs 
(5i TYPE: nucleic acid 
;C> STRANDEDSESS : double 
(D) TOPOLOGY: linear 

(xi > SEQUENCE DESCRIPTION: SEQ ID NO:306: 

- ~ _ -l ; _> ^uA _ A^^GGZGAGG CTGGTCTGGA ACTCCCGACC YYGIGA^GCCA. GCTGCCTTGG 6 2 

GGTGT IAAAG TGGTGGGATT AC AGGCGTG A GZACGACGCC CGAGCGATAG CTCTTTACAA 122 

GTGGCTTGTA .AA G AAA G G A T CATTTGGCAC TGTTAGTATT TGTGTTGAA 26 9 

(2) INFORMATION FOR SEQ ID NO:2C7: 

(i) SEQUENCE CHARACTERISTICS: 

■A; LENGTH: 303 base sairs 

■ £ i TYPE : nucleic acid 

■ C ; S TFAJs D EON ESS : double 
•• D TOPOLOGY : linear 

,xi SEQUENCE DESCRIPTION: SEQ ID NC:3G7: 

~ ^- i A G A o A G T A, I G T 2 A G G AA G A C AA GTGAGATTG G C A. T T TT AAA T .AAA. G T T G T A 2 6 2 

A _ GAA.GAATA A.TTGGAA.TGA. TGAGGTAA.TT TTTTTAAA.CA. AAGGTTCTTG ATTTAGTGTT 122 

A. ~ GA.T . G G AA AAAAAA T T A G AAAA> T AAA. G T AA G T S C C A T A. G G C T AA. T T AA AAAA. T AAAA. 2 2 5 2 

^ .u^o ^ u- \_- ^ ^ u ^ ^- * G l- ~ ^ - A. C o 2 2 . ATA A.T CO C AG C AG TTT G G G A G G 2 C G A G A 2 G G G 2 2-2 

-~ ^ r - - - ~ * ^ ^ - - A. o ^ A. o A. . ^ ^ A. G A C w A. — 3 ^ G 7 T AA. C A. G G G T G AAA. 2 2 2 C A. T G T 3 T A. 2 2 ^ ^ 
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AAGTTACATT 
AAAGTTTCTG 
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CA) ^ T VcUic acid 

cccacgcxca ^ccccac TGGGGA^t jccTGCCTTA 

CGTI CTTCCG CTCAAATCCT GATCT 
MAGMTACA GAAGAAlCC 

(xl) 

OT «GTACG ACCTCTTCCT GAAl ACCC IATGTA 

ggct^atctt gtacijCTTTa t ^ ataTGGTCCC 

ccctatcagt actaggtgta ««n ^ nGcccMci tGtAGAGGC G 

icmGCIG A .AAACC *T TACGGAGAAG GXCACAACAG 

TGGAAG 

(2) x^ORMATlOK FOR SEQIB NO: 311. 

S TOPOLOGY: Unear 

(xi) sz^c, ^ agtcagagaG CTGTG , CT , 

7CGACTCGGT CCTCCtfWTC 

TCCAGCTGAC CCCTCTCTG GG.-^ TC , TGAACAC CTTCGCGTC 
GCAAGATCCG GGAAGAG TAC ' C * G — * ^ iCGC CAC CCTKTCGGTC 
. rTGTC .GACACGGTR GTGGAG.C^ «CA. 



60 
120 
180 
199 



atccctgagg 
ctcggaactg 

ACCTTCACTG 

tttgttgcag 

GCACTGGTTT 
CTATGGCTTT 



60 
120 

180 

2^0 

300 
360 
U20 
426 



1TCCAGGGCT 
CTGCTCATCA 
A TGCCCTCAC 

caccagctgg 
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SE0YEN2E CHARACTERISTICS : 
!A) LENGTH: 329 base pairs 
iB; TYPE: nucleic acid 
: 3 ; STPANDEDNESS ; double 



.near 



xi- SZZT:z:;ZZ DESCRIPTION: SEQ ID NO: 215; 
- - - - . ^ v.* u- A~_-. . .-A. ^ L: _ A - - AATTT 



* - ^- _ _ -w- v.- w ^ ^ vj: ~. ^ ~ ^ ^ ^ A - 3 A _ T TT I 3 I 3 A. 3 A T 3 T T A 3 I A 3 T 3 3 T T 3 AA 

— _ ~ . w - _? . » ^- _ _ ^ ^ _ v- > ^- * AA . . . A 3 ^ A A 3 T 3 A 3 A T T A 3 T AAA AT 3 A T T 3 3 T 3 
~ ~ _ ^- _ ^ ^ ^ a _ 0 ^ ^ AA ^ AA AAA. 3 3 3 A 3 T T 31 .AA 3 33 .AA 1 AA T 3 3 T .AAA- 1 . 
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TGGAAAACAC AGATGAAACC TA3TGCATTG A C AA C G A G G C CGTGTATGA3 AT I 

G3ACC3TGAA GCTGACCACC CCCACCTACG GGGACCTCAA CCA3GTGGTG 7ZGGZZAZZA 

TGAGCGGGGT AACAC3TG 3T TGZGZZZYZZ GGGZCkGZTG AACGAGA3GT GGCAAAGTGG 

GGGTTGAC.AT GGTGCCTTT I CTGGCTGAAT TTTTAATGCC CGGTTTGGG3 CCTACCAGCC 45 

^GGl; AA G 3 A 4 3 9 

(2) INFORMATION FCR SEQ ID NO: 312: 

(i; SEQUENCE CHARACTERISTICS: 

t'A') LENGTH: 302 base pairs 
\£:. TYPE: nucleic acid 
v'C) STFANDEDNESS : double 
\ D ) T3P3L03Y: linear 

(xi) S EOT EN 3 E DESCRIPTION: SEQ ID NO: 212: 

u . . u . _ _ G C A l- ^ C ^ AAT 3 A T T G T T T T T A G AAAA G G ATA T A C ATT G A 3 3 TT 3 AA. T G T AA 6 2 

T AA G AAA.T G C AA3ACTTTA3 GGTGTCCAAC TGCTAAGATT TATTTCCAAC TTGTCAGACA 122 

CAACCATTTT GCCCAATCCA AA. T C AAA G G G AATCAAGGCT GTGAAATC3A OA. C AGO AC AT ISC 

u C ^- - C A ^ A _ AAA. T G AAA A C T A C A 3 A T G T G T C A. G A G G C AA. 0 C A T A. T A C A C A C AAA T AA 2 - 2 

i l- _ _ _ A „ ^ AAA . C w A T AA G T A G C T G T C 3 A G G G AA T A C T T T C C AAA T AA. C C T T C A G C 2 21 

A 3 " ~ : r 

2N72RXACT2N ?2R SEQ ID N3:315: 



WO 93/00353 A A PCT/US92/05222 



-192- 

Note regarding Claims: Certain SEQ ID NOS are excluded from 
some claims based on their homology to known non-human 
sequences (See Table 2) . 
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WHAT IS CLAIMED IS : 

1. An enriched oligonucleotide having a sequence 
designated as one of: 

5 SEQ ID NO: 1 - 315; 

or having a sequence complementary thereto. 

2. An enriched oligonucleotide having a sequence 
designated as one of: 

SEQ ID NO: 1 - 315, except SEQ ID NOS : 22 or 157; 
±u cr allelic variation or complementary sequence 

thereto or portion thereof at least 15 nucleotides in 
length . 

2. An isolated oligonucleotide that includes a sequence 
designated as one of: 
lD — Q 13 NO: 1 - 315 ; except SEQ ID NOS: 22, 187; 

cr allelic variation cr complementary seuuence 
thereto cr portion thereof at least 15 nucleotides in 

4. An enriched cr isolated cliccnuciectide coerablv 
2C coding for a human gene product, which includes a region 

coa^ng for the same ammo acid sequence as the codinc region 
of a gene corresponding to a sequence designated as one cf: 
SEQ ID NO: 1 - 215. 
5 . The s e cu e n c e of Claim 4 , wh e e i ~ ^ ^ d c Z ID N ^ is 
2 5 listed in Table 6 . 

6. The sequence of Claim 4, wherein said SEQ ID NO is 
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corresponding to the EST identified as: 

SEQ-ID NO: 1 - 315; _ _ 

^«r,4-=.T-v thereto or comprising an 
or a sequence complementary thereto 

allelic variation thereof. 

The oligonucleotide of Claim 10, wherein said SEQ 



11 

NO is 1-315. 



l^The oligonucleotide of Claim 10, wherein the SEQ ID 

NO is 1001-1500. . n 

13. Th. oligonucleotide of Claim 10, wherein the SEQ ID 

10 NO is 1501-2000. _ D 

14. The oligonucleotide of Claim 10, wherein the SEQ ID 

NO is 2001-2421. . 

15. The oligonucleotide of Claim 10, wherein said 
sequence further includes the entire sequence designated as 

15 any one of SEQ ID NOS : 1-315. 

16. An enriched or isolated oligonucleotide fragment 

comprising at least 15 bp of a sequence of Claim 10 and 
wherein said SEQ ID NO excludes NOS 22 and 187. 

17 An enriched or isolated oligonucleotide sequence 
20 corresponding to a human gene, which hybridizes to a 

sequence designated as any one of SEQ ID NOS 1-315, except 
SEQ^D NOS 22, 187, or to a sequence complementary thereto, 
under hybridization conditions sufficiently stringent to 
require at least 97% base pairing. 
25 18. An oligonucleotide according to any one of Claims 

17, in substantially purified form. 

19. A construct comprising a vector and an 
oligonucleotide according to any one of Claims 1-1/- 

20. The construct according to Claim 19, further 

3 0 comprising a promoter operably linked to said 

oliaonucleotide. 

' 21 . A panel of at least 100 oligonucleotides accoramg 

to Claim 3 or Claim 16. 

22 An antisense oligonucleotide capable of blocking 

the gene product of any one of the sequences 



expression o- 
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of Clair 10. 

23. A triple helix probe capable of blocking expression 
of the gene product of any one of the sequences of Clair. 1C. 
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